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ABSTRACT 


The thesis is divided into two parts. In the first part, we describe and analyze 
several new VLSI layouts for the shuffle-exchange graph. These include 
1) an asymptotically optimal, @(N2//og2N)-area layout for the N-node shuffle- 
exchange graph, and 
2) several practical layouts for small shuffle-exchange graphs. 
The new layouts require substantially less area than previously known layouts 
and can serve as the basis for designing large scale shuffle-exchange chips. 


In the second part of the thesis, we develop general methods for proving lower 
bounds on the layout area, crossing number, bisection width and maximum edge 
length of VLSI networks. Among other things, we use these methods to find 

1) an N-node planar Jpn which has layout area O(N/ogN) and maximum — 
edge length O(N//4//og!/2N), 

2) an N-node graph with an O(N’/2)-separator which has layout area 
O(Niog?N) and maximum edge length O(N/ZiogN/loglogN), and 

3) an N-node graph with an O(N®)-separator (for a>//2) which has maximum 
edge length O(N®). 

The area results indicate that some graphs with O(N/”2)-separators (and, in 
particular, some planar graphs) do not have linear-area layouts, thus disproving a 
popular conjecture. The edge length bounds indicate that the layouts of some 
networks must have very long wires (possibly as long as the width of the layout). 
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INTRODUCTION 


_ The recent engineering advances in Very Large Scale Integrated (VLSI) circuitry 
have made it possible to wire tens of thousands of transistors onto a single chip. In 
the near future, it is expected that fabrication of chips containing millions of. 
transistors will be commonplace [MC80]. In order that this massive computational © 
resource be efficiently utilized, theoretical researchers have been actively trying to 
answer such questions such as: 


1) “What is a good model for VLSI chip design and computation?,” 


2) “What communications networks can best perform important operations 
such as sorting, matrix multiplication and discrete Fourier transform?” and 


3) “What is the best method of laying out a network on a chip?." 


Several models have been proposed for VLSI computation [T80,LS81,CM8]1]. 
The most widely accepted is due to Thompson and is known as the Thompson 
model [T79,T80]. Thompson’s model of a VLSI chip is quite simple. The chip is 
presumed to consist of a grid of vertical and horizontal tracks which are spaced 
apart by unit intervals. Processors are viewed as points and are located only at the 
intersection of grid tracks. Wires are routed through the tracks in order to connect 
pairs of processors. Although a wire in a horizontal track is allowed to cross a wire 
in a vertical track, pairs of wires are not allowed to overlap for any distance (1.e., in 
they cannot.overlap in the same track). Further, wires are not allowed to overlap 
processors to which they are not linked. As an example, we have drawn a 
Thompson model layout of a 4-processor network in Figure 1. 


Figure 1: A Thompson model layout of a 4-processor network in 
which each processor is linked to every other processor. 


Much has also been accomplished in the way of finding good communications 
networks for VLSI. For example, the complete binary tree [MC80], the 2- 
dimensional mesh [TK77,KL78,MC80], the cube-connected-cycles graph [PV79] 
and the shuffle-exchange graph [S71,L75,L76,NS79,P80,S80,SR80a,179,T80] are all 
known to be capable of performing a wide range of operations. The shuffle- 
exchange graph, in particular, is an incredibly powerful and efficient 
communications network. Among other things, it can be used to compute discrete 
Fourier transforms, multiply matrices, sort lists and evaluate polynomials. Except 
for sorting (which requires O(/og?N) time), these operations require no more than 
logarithmic time and constant space per processor. This is exponentially faster than 
the running times of the corresponding sequential algorithms and the 
corresponding parallel algorithms on networks such as the 2-dimensional mesh. 
As, in addition, the processors required for these operations are quite simple, the 
shuffle-exchange network is very well suited for VLSI implementation on a chip. 


The shuffle-exchange graph comes in various sizes. In particular, there is an 
‘N-node shuffle-exchange graph for every N which is a power of two. Each node of 
the (N= 2*)-node shuffle-exchange graph is associated with a unique k-bit binary 
string a,.;+- +a). Two nodes w and w’ are linked via a shuffle edge if w' is a left 
or right cyclic shift of w (ie, if w= ay,-+++dq9 and w'= aj.5++-Qgay., OF 
w'= dy+++dy.,a, , respectively). Two nodes w and w' are linked via an 
exchange edge if w and w' differ only in the last bit (i.e., if w = a,.;---a,0 and 
w'= a,.;-++a@,1 or vice-versa). As an example, we have drawn the 8-node 
shuffle-exchange graph in Figure 2. Note that the shuffle edges are drawn with 
solid lines while the exchange edges are drawn with dashed lines. We shall follow 
this convention throughout the thesis. 
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Figure 2: The 8-node shuffle-exchange graph. 
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The third question of interest to VLSI researchers ("What is the best method of 
laying out a network on a chip?") has proved to be, by far, the most difficult. It is 
also the subject of this thesis. In order to answer the question for a particular 
network, we must do the following three things: 


1) decide what it means for a layout to be “good,” 
2) find a "good" layout for the network, and 
3) prove that the layout is as "good" as possible. 


Most people agree that a “good” layout is one which does not require much 
area. This is quite reasonable since small layouts are easier to wire on a chip, cost 
less and have far higher yields than layouts with larger amounts of area. Recently, 
there has also been interest in designing layouts with short wires. Although wire 
length considerations are not as important as area considerations, it is possible that 
layouts with long wires may ‘be less efficient and run slower (due to longer 
transmission times) than layouts with shorter wires. Both quantities are easily 
expressed in terms of the Thompson model, which is nice from a mathematical 
point of view. For example, the /ayout area of a network is the minimum amount 
of area required to lay out the network in the Thompson model. (The area of a 
layout in the Thompson model is defined to be the product of the number of 
vertical tracks and the number of horizontal tracks which contain a processor or 
wire segment of the layout.) Similarly, the maximum edge length of a network is 
the minimum amount of wire which is needed to embed the longest edge in any 
Thompson model layout of the network. 


Good layouts are known for several communications networks; including the 
complete binary tree [MR79,PRS81,BL81], the 2-dimensional mesh and the cube- 
connected-cycles graph [PV79]. The known layouts for the shuffle-exchange graph, 
however, are not very good. Thompson [T80] was the first to find a nontrivial 
layout for the shuffle-exchange graph. In particular, he found an O(N*/log!/2N)- 
area layout of the N-node shuffle-exchange graph. He also showed that any layout 
for the N-node shuffle-exchange graph must have at least 2(N2/log2N) area. Hoey 
and Leiserson [HL80] improved the upper bound by finding an O(N?/logN)-area 
layout for the N-node shuffle-exchange graph. Neither Thompson's nor Hoey and 
Leiserson’s layouts are practical, however, and neither meets Thompson's 
asymptotic lower bound. 


In Part I of the thesis, we find good layouts for the shuffle-exchange graph. In 
particular, we describe an asymptotically optimal O(N2/log2N)-area layout for the 
N-node shuffle-exchange graph. Although the layout is not optimal for small 
values of N, we show how it can be modified in order to produce good layouts for 
small shuffle-exchange graphs. As these layouts are practical, it should now be 
possible to build a shuffle-exchange chip. | 


Finally, we are left with the task of proving that a layout which appears to be 
good is, in fact, optimal. Although Thompson [T79,T80], Vuillemin [V80] and 
Lipton and Sedgewick [LS81] have all shown how to prove area lower bounds for 
certain computationally useful networks (such as the shuffle-exchange graph), it is 
not known how to prove such lower bounds in general. For example, no nontrivial 
lower bounds have been found for the class of graphs which have O(N//4)- 
separators. (This class includes the very important class of planar graphs.) Nor 
have any methods been discovered. for proving nontrivial lower bounds on the 
maximum edge length of a network. 


In Part II of the thesis, we describe several techniques for proving good layout 
area and maximum edge length lower bounds. In particular, we concentrate on 
finding good lower bounds for the crossing number, wire area and maximum edge 
crossing of a network. The crossing number of a graph is the minimum number of 
pairs of edges which must cross in any drawing of the graph in the plane. The 
maximum edge crossing of a graph is the largest number of edges which must be. 
crossed by some edge in any drawing of the graph. The wire area of a network is 
simply the minimum amount of wire which must be used to embed the network in 
the Thompson model. It is clear that for any network, 


crossing number < wire area < layout area 
and also that 
maximum edge crossing < maximum edge length. 


In addition, the crossing number, wire area and maximum edge crossing are 
worth minimizing independent of layout area and maximum edge length 
considerations. This is due to the fact that 


1) chips with a large number of wire crossings (and, in particular, those with 
wires which cross many other wires) have substantially nore problems with 


capacitive coupling (i.e., interference between overlapping wires) than do 
chips with fewer crossings, and 


2) chips with high wire area cost more and experience lower yields than do 
chips with lesser wire area. 


Unfortunately, the results of Part II indicate that the crossing number and wire 
area are uSually as large (up to a constant factor) as the layout area. In addition, 
the maximum edge crossing is often nearly as large as the side length of the chip. 
More importantly, however, crossing number and wire area arguments can be used 
to prove better lower bounds on the layout area and maximum edge length than 
were possible with existing. eronianes In particular, we will use such arguments 
to find 


1) an N-node planar graph which has layout area O(N/ogN) and maximum 
edge length O(N’/7/log!/2N), 


2) an N-node graph with an O(N/% oe ‘which has taut area 
@(Niog*N) and maximum edge length O(N/”2/ogN/loglogN), and 


3). an N-node graph with an O(N®%)-separator (for a>J/2) which has maximum 
edge length O(N®). 


The area results indicate that not all graphs with O(N//*)-separators (and, in 
particular, not all planar graphs) can be laid out in linear area, thus disproving a 
popular conjecture. The edge length bounds indicate that layouts of certain 
networks must have some very long wires (possibly even as long as the side length 
of the layout). Taken together, these results answer all of the previously open 
questions concerning layout area and maximum edge length of VLSI networks 
with known separators. 


This empty page was substituted for a 
blank page tn the original document. 


PART I 


LAYOUTS FOR THE SHUFFLE- EXCHANGE GRAPH 


CHAPTER 1 
REVIEW OF KNOWN LAYOUTS 


In this chapter, we review the known layouts of the shuffle-exchange graph. In 
section 1.1, we describe Thompson’s [T80] straightforward O(N*/log!/2N)-area 
layout. This is followed in section 1.2 by a detailed description of Hoey and 
Leiserson’s complex plane diagram. The complex plane diagram is very helpful in 
finding good layouts for the shuffle-exchange graph. For example, Hoey and 
Leiserson [HL80] have used the diagram to find an O(N2/logN)-area layout for the 
N-node shuffle-exchange graph. In Chapter 2, we will use the diagram to find a 
variety of layouts for the N-node shuffle-exchange graph including one which 
requires only O(N2/log?/2N) area. (Such a layout has also recently been found 
independently by Steinberg and Rodeh [SR80b].) The complex plane diagram will 
also be used in Chapter 4 as an aide in the construction of good practical layouts 
for small shuffle-exchange graphs. 


1.1. Thompson’s Layout | 


Thompson was the first to investigate VLSI layouts for the shuffle-exchange 
graph. In his thesis [T80], he showed that any layout for the N-node shuffle- 
exchange graph requires at least 2(N2/log?N) area. (We reprove this fact using 
crossing number arguments in Part II of the thesis.) In addition, he described a 
layout requiring only O(N?/log!/2N) area. In what follows, we present 
Thompson’s layout and give a simple proof that it does, in fact, require just 
O(N*/log!/2N) area. 


Given any k-bit string w, define the size of w to be the number of /-bits it 
contains. For example, the size of /0//0 is 3. Thompson’s idea was to lay out the 
N=2* nodes of the shuffle-exchange graph on a straight line in order of 
nondecreasing size. It is easily seen that shuffle edges link nodes which have the 
same size and that exchange edges link nodes which have sizes differing by one. 
Thus the edges of such a layout are relatively short. In particular, the number of 
horizontal tracks needed to embed all of the edges is at most O( max B,) where 


B, is the number of nodes of size s. This is due to ‘the fact that at most 
O(B,.;+8,+B,, ;) edges can cross any vertical cut of the layout which is located 
between a pair of nodes of size s. 


It is easy to show that B, = C(ks) for each s where 


C(ks) = ki [st(k-s)!] 


is the well-known function for binomial coefficients, It is also well-known that 
C(k,s) achieves its maximum value at s= k/2 for any k. Using standard asymptotic 
analysis, it is easily shown that C(k,k/2) ~ @(2*/k!/2) for large k.. (For a good 
review of such techniques, see Bender and Orszag’s book [BO78].) Thus 
Thompson's layout requires only O(N/log!”2N) horizontal tracks. Since at most 3 
vertical tracks are needed to embed the vertical portions of the edges incident to 
any given node, we can conclude that Thompson's layout has area O(N2/log!“2N). 


1.2 Hoey and Leiserson’s Complex Plane Diagram 


_ In [HL80], Hoey and Leiserson observed that there is a very natural embedding 
of the shuffle-exchange graph in the complex plane. In what follows, we describe 
this embedding (henceforth referred to as the complex. plane diagram) and point 
out some of its more important properties. In addition, we give a brief description 
of the method used by Hoey and Leiserson to transform the diagram into an 
O(N2/logN)-area layout for the N-node shuffle-exchange graph. 


12.1 Definition 


Let 6, = e2"k denote the kth primitive root of unity. Given any k-bit binary 
string w = 4;.;--+-@,, let pw) be the map which sends w to the point 


Aw) = “ay. 8%! toon t a5, + a 


in the complex plane. As each node of the (N= 2k)-node shuffle-exchange graph 
corresponds to a k-bit binary string, it is possible to use the map to embed the 
_ shuffle-exchange graph in the complex plane. For example, we have done this for 
the 32-node. shuffle-exchange graph (whence k=5) in Figure 1-1. As is usual, we 
have drawn the shuffle edges with solid lines and the exchange edges with dashed 
lines. For simplicity, each node is labeled with its value instead of its 5-bit binary 
string. (By the value of a node, we mean the numerical value of the associated 
k-bit binary string.) 
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Figure 1-1: The complex plane diagram for the 32-node 
shuffle-exchange graph. (Taken from [HL80].) 


1.2.2 Properties 


Examination of Figure 1-1 indicates that the complex plane diagram has some 
very interesting properties. First, it is apparent that the shuffle edges occur in 
cycles (which we call necklaces) which are symmetrically placed about the origin. 
This phenomenon is easily explained by the following identity: 


5; Way, -- > A) = Ay. 5% + ay 98 yk! + +++ + a8? + 95, 


ay.75,*! Se a5 x + ay.] 


Pay-2+ + + M9Ay.1)- 


It 


Thus traversal of a shuffle edge corresponds to a 21/k rotation in the complex 
plane. 


Except for degenerate cases, the preceding identity also indicates that each 
necklace is composed of k nodes, each a cyclic shift of the other. Such necklaces 
are called full necklaces. Degenerate necklaces contain fewer than k nodes and, 
because they must have some symmetry, are mapped entirely to the origin of the 
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complex plane diagram. For example, {00000} and {0/01, 1010} are degenerate 
necklaces while both {/0/, 011, 110} and {11100, 11001, 10011, OOI11, 01110} are 
full. 


It will often be convenient to. refer to a necklace by one of its nodes. In 
particular, we will use the notation <w> to indicate the necklace generated by w. 
This is simply the collection of cyclic shifts if w. For example, the necklace 
generated by /0] is <Jl0I> = {10I, O11, 110} . 


Exchange edges are also embedded in a very regular fashion by the complex 
plane diagram. In fact, each exchange edge is embedded as a horizontal line 
segment of unit length. This phenomenon is explained by the identity 

Pay.; --- a0) +l= ay. 8,4! teoeet a5, +1 


. = Way; ---a;l). 


In some cases, several exchange edges are contained in the same horizontal line 
of the diagram. Such lines are called /evels. For example, there are 9 levels in the 
diagram of the 32-node shuffle-exchange graph shown in Figure 1-1. We will use 
the properties of levels in Chapter 2 to find an O(N?/log?”2N)-area layout for the 
N-node shuffle-exchange graph. They will also be used in Chapter 4 to find good 
practical layouts for small shuffle-exchange graphs. 


1.2.3 An O(N?/ogN)-Area Layout 


In [HL80], Hoey and Leiserson showed how to use the complex plane diagram 
to construct an O(N7/logN)-area layout for the N-node shuffle-exchange graph. 
Their method was very involved, however, and we have chosen not to include it 
here. The basic idea is to use the structural properties of the complex plane 
diagram to find an O(N/log!’2N)-separator for the N-node shuffle-exchange graph 
whenever N is of the form 22” for some r>0. The separator can then used to 
construct an O(N?/logN)-area layout by using Leiserson’s general layout technique 
for graphs with known separators [L80a]. 


Shortly after writing [HL80], Hoey and Leiserson found a far simpler 
O(N2/logN)-area layout for the N-node shuffle exchange graph which was, in 
addition, valid for all N. By the that time, however, we (as well as several others) 
had also observed that the complex plane diagram could be used to find a simple 
layout for the shuffle-exchange graph. This layout is described in Chapter 2. 
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CHAPTER 2 


LAYOUTS BASED ON THE COMPLEX PLANE DIAGRAM 


In this chapter, we present several layouts of the shuffle-exchange graph which 
are based on Hoey and Leiserson’s complex plane diagram. We commence in 
section 2.1 with a straightforward O(N7/logN)-area layout of the N-node shuffle- 
exchange graph. As we mentioned in Chapter 1, this layout has also been 
discovered by many others (including Hoey and Leiserson). In section 2.2, we 
show how the layout can be modified so as to require only O(N2/log?/2N) area. 
The latter layout was also discovered independently by Steinberg and Rodeh 
[SR80b]. We conclude the chapter by mentioning an additional O(N7/log*/2N)- 
area layout as well as a layout which might require even less area. 


2.1 A Straightforward O(N?/logN)-Area Layout 


In this section, we describe a straightforward layout of the shuffle-exchange 
graph which requires only O(N2/logN) area. The layout is formed from a grid of 
levels and necklaces which we refer to as the /evel-necklace grid. Each row of the 
grid corresponds to a level of the complex plane diagram. The columns are 
divided into consecutive column pairs, each pair corresponding to a necklace. In 
particular, the leftmost column of each column pair corresponds to that part of the 
necklace which is contained in the left half of the complex plane. Similarly, the 
rightmost column corresponds to the part of the necklace contained in the nght 
half of the complex plane. We assume that the rows are ordered from top to 
bottom so as to be consistent with the natural ordering of the levels in the complex 
plane but (for the time being) place no restrictions on the left-to-right order of the 
necklaces. 


Each node of the shuffle-exchange graph is placed at the intersection of the row 
and column of the grid which correspond to the level and part of the necklace (left 
half or right half) to which it belongs in the complex plane diagram. For example, 
we have done this for a random ordering of the necklaces of the 32-node shuffle- 
exchange graph in Figure 2-1. 


necklaces 


<3> <7><31><1l1> <1> <5><0> <15> 


levels 


wow On KD NO F&F WD FE 


_ Figure 2-1: A level-necklace grid for the sevens SHU excnanEe graph. 


Notice that we used just one vertical track to embed the necklaces <0> and <3J> 
in the grid. As each necklace contains just one node, it is clear that this is 
sufficient. In general, necklaces which are mapped to the origin by the complex 
plane diagram are a nuisance since they become lumped together in a single point 
of the level-necklace grid. Fortunately, there are relatively few such nodes. In 
particular, Hoey and Leiserson showed the following. 


Lemma 2-1 (Hoey and Leiserson [HL80]): At most O(N/logN) nodes of the N- 
_ node shuffle-exchange graph are mapped to the origin of the complex plane diagram. 


Proof: Every node which is mapped to the origin of the complex plane diagram 
is adjacent (via an exchange edge) to a node at position (/,0) or (-/,0). Any node 
which is not mapped to the origin is contained in some full necklace, at most two 
nodes of which are contained in positions (/,0) or (-/,0). Thus for every pair of 
nodes which are mapped to the origin, there are at least k = logN nodes which 
are not mapped to the origin. Thus at most O(N/k) = O(N/logN) nodes can be 
mapped to the origin O . 


Since at most O(N/logN) nodes are mapped to the origin, we can (for the time 
being) ignore them. They can always be inserted later at a cost of. at most 
O(N/logN) additional vertical and horizontal tracks. Since any layout of the 
shuffle-exchange graph which we will consider will have at least Q(N//ogN) vertical 
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and horizontal tracks, the added tracks can increase the area of the final layout by 
at most a constant factor. We will also use this strategy in Chapter 3 when we 
ignore several O(N/logN)-sized sets of nodes. 


Since each full necklace contains at most k = logN nodes, it is easy to see that 
the N-node shuffle-exchange graph has at most O(N/logN) full necklaces. Thus at 
most O(N/logN) vertical tracks are needed to embed all of the shuffle edges in the 
level-necklace grid. It is also easy to show that at most N horizontal tracks are 
needed to embed all of the exchange edges (one track is used for each exchange 
edge). Thus the total area of the layout for the N-node shuffle-exchange graph is 
O(N2/logN). As an example, we have added the edges of the 32-node shuffle- 
exchange graph to the level-necklace grid in Figure 2-1 to produce the layout 
shown in Figure 2-2. Note that we have omitted <@> and <3/> in this layout since 
they are mapped to the origin of the complex plane diagram. 


necklaces 


<3>  <7> <11l> <1l> <5> <15> 


levels 


Figure 2-2: Layout produced from the level-necklace grid shown in Figure 2-1. 
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2.2. An Improved O(N?/fog3/2N)-Area Layout 


It is possible to improve the layout described in section 2.1 by reducing the 
number of horizontal tracks needed to embed the exchange edges. This can be 
done in two ways. First, exchange edges which are in the same level of the 
complex plane diagram but which do not overlap in the level-necklace grid can be 
inserted on the same horizontal track. As more exchange edges are inserted on the © 
same track, fewer total tracks will be needed to embed all of the exchange edges. 
Secondly, the necklaces can be re-ordered so as to increase the average number of 
exchange exchange edges which can be inserted on each horizontal track. 


Although we do not know how to best order the necklaces in general, we have 
found several orderings which yield O(N7/log?/2N)-area layouts for the N-node 
shuffle-exchange graph. For instance, we will show in what follows that such a 
layout can be constructed by arranging the necklaces from left to right in order of 
nondecreasing size. (The size of a necklace is simply defined to be the size of any 
of its nodes.) This observation has also been made by Steinberg and Rodeh in 
[SR80b]. . 


In order to bound the number of horizontal tracks needed to insert the exchange 
edges, we will show that the maximum overlap of exchange edges on each level 
occurs in between necklaces of size k/2. Since the maximum overlap of exchange 
edges on each level is an upper bound on the number of horizontal tracks needed 
to insert the exchange edges on that level, we can thus conclude that the total 
number of horizontal tracks needed to insert all of the exchange edges is at most 


O(B yp) ” OWN log!“2N) . 
Thus the resulting layout will have area at most O(N2/log?/2N). 


It is not immediately clear why the maximum overlap on each level occurs 
between nodes of size k/2, however. In what follows, we break up each level into 
sublevels (for which the analysis is easier) and show that the maximum overlap on 
each sublevel occurs between necklaces of size k/2, Before doing this, however, we 
must introduce some further notation. 


Consider a node of the form a,.,;--+a,0 for which either a,.;=0 or a;=0 or 
both for each i<k. We will refer to such a node as basis node A node 
by.;+++5g is said to be generated by the basis node a,.;---ag if 
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2) by.;=5; whenever a,.;=a;=0 for 1<i<k. 
For example, /0000 generates /0001, 11100 and 1/110] but not ////1J. 


It is not difficult to show that if u generates v, then both u and v are on the same 
level of the complex plane diagram. For example, let u = aj.;-++@ and 
v = by,-++bg and observe that 


Av) - pu) 


(by.77 Gy-1) §,¢! + coe t (b, - a)) 5, + (bg - ag) 
= cp 8K +... + 6/8, + Cp 


where c,.,=c; foreachii 1<i<k. Since 5,*! is the complex conjugate of 
5,/ for 1<i<k, wecan conclude that p(v) - p(u) is a real number and thus 
that u and v are in the same level of the complex plane diagram. 


It is also easy to show that each node of the shuffle-exchange graph is generated 
by a unique basis node. In particular, the node which generates b,.;---bg can 
be found by 


) setting by=0 and (if k is even) setting by, 2=0, and ~ 
2) setting bj=b,.;=0 for each i such that (originally) 5,=b,.;=1. 


Since exchange edges “link nodes which are in the same sublevel, we can 
conclude from the preceding arguments that it is possible to partition each level of 
the complex plane diagram into sublevels so that the nodes in each sublevel are 
precisely the nodes generated by some basis node. We will now show that the 
maximum overlap at each sublevel occurs between necklaces of size k/2. 


Since the necklaces have been arranged from left to right in order of 
nondecreasing size, we can use arguments similar to those of section 1.1 to 
conclude that the overlap of exchange edges between two nodes of size s in any 
sublevel is at most OC max B,') where B,' is the number of nodes in that 
sublevel with size s. A qaishiforward counting argument shows that each basis 
node of size r generates 


1) C(k/2 - +, i) nodes of size s=r+2i for any i< k/2-r, and 
2) C(k/2 - r, i) nodes of size s=r+2i+1 for any i< KW2-r 
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when k is odd, and 


1) C(k42-r-1, i) + Ckh/2-r-Li-lh = C(k/2-,r, 2 nodes of size 
s=r+2i for any i< k/2-r, and 


2) 2C(k/2 - r- I, i) nodes of size s=r+2i+1 for any i< k/2-r- ] 


when k is even. We can therefore conclude that in all cases, the maximum value : 
of B,' occurs when i = (k- 2r)/4 and thus when s=k/2. This concludes the 
proof. . 


As an example, we have drawn such a layout for the 32-node shuffle-exchange 
graph in Figure 2-3. Note that far fewer horizontal tracks are needed for this 
layout than are used for the layout in Figure 2-2. For completeness, we have 
included the necklaces <0> and <3J> even though they are degenerate. 


necklaces 


<7>  <1ll> <15><32> 


levels 


Figure 2-3: An improved layout for the 32-node shuffle-exchange graph. 
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2.3. Other Layouts 


It is not difficult to find other orderings of the necklaces which produce 
O(N?/log?/2N)-area layouts for the N-node shuffle-exchange graph. For example, 
Lepley [LLM8]] used standard statistical methods to show that the arrangement of 
necklaces from left to right in order of nondecreasing radius produces such a 
layout. (By the radius of a necklace, we mean the radius of the circle in the 
complex plane which contains the necklace.) The proof is similar to the one in 
section 2.2. In particular, it is shown that the maximum overlap in most levels 
occurs in the same place and that the total overlap of all of the levels at that point 
is O(N/log!/2N). ; 


Although we consider it likely that better orderings of the necklaces exist, we do 
not know of any ordering which (provably) results in a layout with less than 
o(N?/log’2N) area. There is another ordering of interest, however. That is the 
ordering of the necklaces according to the minimal number represented by each 
necklace. (The minimum number represented by a necklace is simply the smallest 
value of any node in the necklace.) Coincidentally, the layout displayed in Figure 
2-3 has such an ordering. Using techniques which are developed in Chapter 3, it is 
possible to show that the combined maximum overlap of exchange edges in all 
levels is at most O(NloglogN/logN) for this ordering. This is substantially better 
than the O(N/log!”2N) overlap found in previous orderings and also very close to 
the lower bound of Q(N/logN). Unfortunately, we do not know how to show that 
the maximum overlap at each level occurs in the same place. In fact, it appears 
that this may not be the case. (We are deeply indebted to Kleitman for pointing 
out the possibility of such an improvement. Although we were not able use his 
idea in the context of complex plane diagram layouts, it was crucial to the 
development of the asymptotically optimal layout described in Chapter 3.) 


For orderings which have a small combined maximal overlap but for which the 
maximal overlap at each level is difficult to compute (such as the ordering by 
minimal value represented), it may be possible to improve the situation by altering 
the level structure. As Miller pointed out to us, there are many possible levelings 
of the exchange edges. (By a /eveling, we mean any arrangement of the exchange 
edges in levels which is consistent with the necklace structure of the complex plane 
diagram.) Although we have investigated several levelings, we have not found any 
(provably) better layouts for the shuffle-exchange graph by this method. 
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CHAPTER 3 


MORE SOPHISTICATED LAYOUTS 


In section 3.3 of this chapter, we describe an asymptotically optimal 
O(N?/log?N)-area layout for the N-node shuffle-exchange graph. Unlike the 
previously described layouts, the optimal layout is fairly sophisticated and requires 
a substantial amount of preliminary machinery. Most of the necessary definitions 
and lemmas are included in section 3.1. In section 3.2, we describe and analyze a 
near-optimal preliminary version of the optimal layout. The optimal layout is then 
described in section 3.3. In section 3.4, we extend the methods developed in earlier 
sections in order to show that certain useful supergraphs of the N-node shuffle- 
exchange graph can also be laid out in O(N2/log2N) area. We have also included 
an appendix to the chapter in which we prove Lemmas 3-1] through 3-4. 


3.1. Preliminaries 


The layouts described in this chapter are based on some important combinatorial 
properties of strings which contain long blocks of consecutive zeros. Before 
describing the layouts, however, it is useful to review some of these properties. In 
this section, we mention several combinatorial lemmas and definitions which will 
be heavily used in the analysis which follows later. As the proofs of the lemmas 
are somewhat complicated, they have been included in the appendix. 


In what follows, we will be particularly interested in the size and location of the 
longest block of consecutive @-bits in the k-bit binary string associated with each 
node. In order that the size of this block be the same for all nodes within a 
necklace, we allow blocks to begin at the end and end at the beginning of a string. 
For example, the longest block of zeros in the string 0/0/0 starts at the fifth bit and 
has length two. 


Let ¥,(/) denote the number of k-bit strings for which the longest block of 
consecutive zeros has length ¢. For example, ¥ {2)=3. The following combina- 
torial lemma provides a good asymptotic bound on the growth of ¥,(/). 
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Lemma 3-1: For (logk/2+logink < t <“ k and k-0, 
¥ <0) ~~ ak ( ). 


In order to illustrate the important features of the function in Lemma 3-1, we 
have sketched a graph of rhe versus ¢ in Figure 3-1. The maximum of 
7k¥() occurs at ¢ =-logk-1 whence | 


(t+ 2) -(t+ I) 
eka * - pk? 


Ze) = (e!/- Ive 
= .23865. 


. For 1> logk- 1, rh (4) decreases exponentially as t increases. For t < logk - 1, 
rky ;{Q) decreases doubly exponentially as t decreases. 


.28 
exponential 
an ky (t) aepery 
double wae 
exponenttal 
. dropoff 
0 togk-1 k 
t 


Figure 3-1: Density of k-bit binary strings for which the 
longest block of consecutive zeros has length t. 


Roughly speaking, Lemma 3-1 states that the longest block of consecutive zeros 
in nearly 1/4 of all k-bit strings has length precisely Jogk - J. Further, there are 
not many strings of length k with substantially more than /ogk consecutive zeros 
and even fewer strings for which the longest block of consecutive zeros has length 
substantially less than logk. This information is further quantified in the following 
lemma. - 


Lemma 3-2: The number of k-bit strings for which the longest block of 
consecutive zeros has length less than logk - logink - 1 or length greater than 2logk 


is at most O(2*/k) = O(N/logN) . 


As we mentioned in Chapter 2, we may ignore O(N//ogN)-sized sets of nodes 
which have undesirable properties. As such nodes can be inserted with the 
addition of at most O(NV/logN) vertical and horizontal tracks, we can always add 
them later without increasing the total area by more than a constant factor. By 
Lemma 3-2, we can thus henceforth consider only those nodes for which the - 
longest ‘block of zeros has length between Jogk - logink - 1 and 2logk. 


We will also be interested in the size of the second longest block of consecutive 
zeros in each string. Usually, the size of the second longest block of zeros will be 
very close to the size of the-longest block of zeros. We state this observation more 
precisely in the following lemma. 


Lemma 3-3: The sum over all necklaces of the difference in length between the 
longest and second longest blocks of consecutive zeros is at most O(N/logN). 


Using information about the size and location of blocks of zeros within the 
necklace, it is possible to distinguish one particular node in the necklace. More 
precisely, we define the distinguished node of a necklace to be the node containing 
the longest leading block of zeros. For example, 00/0/ is the distinguished node of 
<01010>. Should two or more: nodes of a necklace begin witly equal and maximal 
length blocks of zeros, then each node of the necklace contains at least two blocks 
of zeros of maximal length. In such cases, we distinguish that node for which the 
leading block of zeros is maximal and for which the second occurence of a 
maximal length block of zeros is as near as possible to the beginning of the string. 
For example, 0/0/1 (not 0/10/) is the distinguished node of the necklace </0/0)>. 
For some necklaces, such as <J//J> and <J0/0/0]>, there is no uniquely 
distinguished node. As we show in the following lemma, such necklaces are 
sufficiently rare that we need not consider them further. ~ 


Lemma 3-4; At most O(N/logN) nodes are contained in necklaces which fail to 
have a uniquely distinguished node. 


We refer to the leading block of zeros of a distinguished node as the primary 
block of zeros. If a distinguished node has two or more maximal length blocks of 
zeros, then the maximal length block following the primary block is referrred to as 
the secondary block of zeros. These definitions can be easily extended to any node 
contained in a necklace which has a uniquely distinguished node. For example, 
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the primary block of zeros of 0/0/0 starts in the fifth bit and has length two. Note 
that this string does not have a secondary block of zeros. As another example, we 
note that the secondary block of zeros in the string //0/0 consists solely of the fifth 
bit. Note that the secondary block of zeros (if it exists) always has the same length 
as the primary block of zeros. 


If the last bit of a node occurs in the primary block of zeros, we call that node a - 
primary node. Similarly, if the last bit of a node occurs in the secondary block of 
zeros, we call the node a secondary node. For example, /0//0 is a primary node, 
11010 is a secondary node and /00/0 is neither primary nor secondary. 


Note that all primary and secondary nodes are necessarily even. (We say that a 
node is even if its last bit is 0 and odd if its last bit is /.) Note also that, by Lemma 
3-2, we need only consider necklaces which contain between /ogk - logink- 1 and 
2logk primary nodes. Such necklaces will also have at most 2logk secondary 
nodes. 


In what follows, we will represent nodes in terms of their corresponding 
distinguished nodes. More precisely, we use the notation @y.;+ + +@;, )4@j.7° + +4 
to denote the node a,,++-+dgay.;--+a;. For example, 00/01 denotes the node 
10010. Using this notation, a primary node has the form 0---0---Ow while a 


secondary node has the form 0---Ow'0.--0---Ow" where 0---Ow and 
0---Ow'O0-+-Ow" are assumed to be distinguished nodes. 


3.2 A Near-Optimal Layout 


We are now prepared to describe a near-optimal preliminary version of the 
optimal layout. In section 3.3, we will show how to modify this layout in order to 
construct an optimal O(N2/log2N)-area layout for the N-node shuffle-exchange 


graph. 
3.2.1 Location of the Nodes 


The near-optimal layout is constructed from a logN x O(N/logN) grid of 
nodes. Each column of the grid corresponds to a necklace of the shuffle-exchange 
graph. The nodes of each necklace are ordered from top to bottom so that the ith 
node is a left cyclic shift of the (i-/)st node for each i and so that the distinguished 
node is placed in the bottom row. The necklaces are crdered from Ieft to right so 
that the values of the distinguished nodes form an increasing sequence. For 
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example, we have constructed such a grid for the 32-node shuffle-exchange graph 
in Figure 3-2. In the figure, we have represented each node in terms of the 
associated distinguished node. This representation readily illustrates the fact that 
the last bit of any node in the ith row corresponds to the ith bit of the associated 
distinguished node. Note that the necklaces <00000> and <J/1J11> have not been 
included since they are degenerate. 


sd sil al al wl 
cel ne tl alk 
eel val el eal 
soll el 


00001 0001T 0010L 0011T 0101T 0111T 


01111 


01111 


01111. 


01111 


_ Figure 3-2: The grid of nodes for the 32-node shuffle-exchange graph. 


3.2.2 Insertion of the Edges 


It is easily observed that the shuffle edges can be inserted in the grid with the 
addition of O(N//ogN) vertical and 2 horizontal tracks. In the following, we will 
show that the exchange edges can ‘be inserted with the addition of 
O(NloglogN/logN) vertical and horizontal tracks. Thus the total area of the layout 
is O(N(loglogN)*/log’N). This is only a factor of O((/oglogN)*) off from the 
lower bound of O(N?2/log2N). 


The analysis is divided into two parts. In part (a), we show that only 
O(NloglogN/logN) exchange edges link nodes which are in different rows of the 
grid. Thus such edges can be inserted with the addition of at most 
O(NlozglagN/logN) vertical and horizontal tracks. In part (b), we conclude the 
analysis by showing that at most O(N//ogN) horizontal tracks are needed to insert 
the exchange edges which link two nodes in the same row. 
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(a) Exchange Edges Which Link Nodes in Different Rows 


Consider an exchange edge which links two nodes that are in different rows of 
the grid. In particular, assume that the edge is incident to an even node in the ith 
row for some i. By definition, the even node can be represented as wow' where 
j= and w0w' is the distinguished node of <w0w'>. The exchange edge is 
also incident to the odd node.w/w'. By assumption, w/w’ is not located in the ith - 
row and thus w/w' is not a distinguished node. Since wow' is a distinguished 
node, we know that the ith bit of wOw' (the bit that was changed in order to 
produce wiw') must be in the primary or secondary block of zeros of wow'. 
Otherwise, the primary and (if it exists) secondary blocks of zeros of wi/w' would 
be identical in location and size to the primary and secondary blocks of wow'. 
This would imply that w/w' is also distinguished, a contradiction. Thus wow' 
must be a primary or secondary node. As was previously mentioned, we can 
assume that each necklace has at most 2/ogk = 2loglogN primary and 2/oglogN 
secondary nodes. Thus at most 4/oglogN nodes in each necklace are both even and 
incident to an exchange edge which links nodes in different rows. Since every 
exchange edge is incident to an even node and since there are O(N//ogN) 
necklaces, we can conclude that there are at most O(NloglogN/logN) oe 
edges which link nodes in different rows. 


(b) Exchange Edges Which Link Nodes in the Same Row 


We next show that those exchange edges which link two nodes that are in the 
same row can be inserted with the addition of at most O(N//ogN) horizontal tracks. 
Once again, the analysis is divided into two parts. In the first part, we show that at 
most O(N/logN) exchange edges are contained in the first Jogk rows. Such edges 
can be trivially inserted with the addition of O(N/logN) horizontal tracks. In the 
second part, we show that only 2k horizontal tracks are needed to insert ‘the 
exchange edges in the ith row for any i> logk. Since’ 5 au < yk = 
N/logN ,_ this will be sufficient to show that at most OUW/logN) additional 
horizontal tracks are ‘necessary to insert the remaining exchange edges. 


Consider a necklace which has ¢ primary nodes for some t</ogk. By definition, 
the nodes in the first ¢ rows of such a necklace are all even. Thus, such a necklace 
can have at most r = logk - t odd nodes in the first Jogk rows. By Lemma 3-1, 
we know that there are 


“42 -t] 
Uk ~ (27K) (ek - ek?) 


such necklaces for (/ogk)/2+ logink < t<< k. By Lemma 3-2, we can assume that 
t > logk - logink - ] and thus the total number of odd nodes occurring in the first 
logk rows is at most 


to 2 

3 S (logk- 9) (207i) (@t2". 2") 
fe cinkae . 
(2 kya) > r(e -k dil logk . eo art tosk ) 


(24/8) Sr rez -¢F ") 


Keri 


2/4) pen 


IA 


(2*/k) > 2? 
O(N/logN) . 


Since every exchange edge is incident to an odd node, the above bound implies 
that at most O(N/logN) exchange edges are contained in the first /ogk rows. 


We next consider the number of horizontal tracks necessary to insert the 
exhange edges contained in the: ith row for Dlogk. This number is identical to the 
maximum number of exchange edges that can overlap each other at a single point 
of the ith row. In Figure 3-3, we illustrate the necessary conditions for two 
exchange edges to overlap in the ith row. All representations are in terms of 
distinguished nodes. 


ww" wiw" 
; aaa aera mercies. 
level & woo! whe! 
wow wlw™ 


|w| = i-1 wm <wl < yl 


Figure 3-3: Necessary conditions for exchange cdges to overlap in the ith row. 
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Note that the even end of an exchange edge is always to the left of the odd end. 
Also note that any node which occurs between wOw' and wiw' must be 
represented as wow" where w">w' or as w/w" where w"Xw'. In either case, the 
exchange edge incident to the overlapped node extends beyond the exchange edge 
linking wow' to wiw'. Since there are at most 2-7 nodes between w0w' and 
wiw', these facts imply that at most 24-! exchange edges can overlap at any point 
of the ith row. This observation completes the argument that the near optimal | 
layout requires only O(N(loglogN)?/log?N) area. 


3.3. An Optimal O(N?/log2N)-Area Layout 


In this section, we will modify the layout described in section 3.2 in order to 
produce an optimal O(N*/log*N)-area layout for the N-node shuffle-exchange 
graph. In particular, we will relocate the primary and secondary nodes of each 
necklace so that they are closer to and in the same row as the nodes to which they 
are linked via an exchange edge. Before going into the details of this relocation, 
however, it is necessary to introduce some additional terminology. 


3.3.1" More Definitions 


In order to construct an optimal layout for the shuffle-exchange graph, we have 
found it necessary to break up each necklace into two or, possibly, three pieces. 
The basic piece of each necklace consists of all those nodes which are neither 
primary nor secondary. The primary piece of each necklace consists of the primary 
nodes while the secondary piece consists of the secondary nodes (if there are any). 
For example, the basic piece of <0/0/1> is {01011, 01011, 01011}, the primary 
piece is {0/0//}, and the secondary piece is {010//}. 


It is also necessary to extend the notion of a distinguished node to include pieces 
of necklaces. The distinguished node of a basic piece is the same as the 
distinguished node of the associated necklace. The distinguished node of a primary 
piece of a necklace is that node. of the necklace which becomes distinguished when 
we ignore the primary block of zeros (i.e, when we temporarily replace the 
primary block of zeros in each node of the necklace with an equal-length block of 
ones). Similarly, the distinguished node of a secondary piece of a necklace is that 
node which becomes distinguished when we ignore the secondary block of zeros. 
For example, 0/0//0/11 is the distinguished node of the basic piece of | 
<OIOIIOI1I>, 011011101 is the distinguished node of the primary piece, and 
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O11101011 is the distinguished node of the secondary piece. Note that the 
distinguished nodes of the primary and secondary pieces of any necklaces are 
necessarily odd nodes and thus are contained in the basic piece of the necklace. 


It is important to note that some necklaces (such as <O//11I>) have a 
distinguished node but do not have a distinguished node for the primary or 
secondary piece of the necklace. Fortunately, arguments such as those used to 
prove Lemmas 3-3 and 3-4 can be used to show that at most O(/logN) nodes are 
contained in such necklaces. Thus, we can assume henceforth that every piece of 
every necklace has an associated distinguished node. 


3.3.2 Location of the Nodes 


As in section 3.2, the layout is constructed from a logN x O(N/logN) grid of 
nodes. Each column of the grid corresponds to a piece of a necklace. The nodes 
of each piece are arranged within a column so that a node of the form | 
Ay.j°* *Gz.j++Aq (where a,.;+++ag is assumed to be the distinguished node of 
the associated piece) is placed in the ith row of the grid. Note that nodes in the 
basic piece of any necklace (these include all odd nodes) are in the same row as 
they were in the near-optimal layout described in section 3.2. The columns are 
ordered from left to right so that the values of the distinguished nodes of the 
associated pieces form a nondecreasing sequence. For example, we have 
constructed. such a grid for k=5 in Figure 3-4. 


01001 
01001 [01011 a 


00101 01011 


basic. primary basic secondary primary 
<00101> <00101> <01011> <01011> <01011> 


Figure 3-4: Relocated nodes for the 32-node shuffle-exchange graph. 
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Note that the necklaces <00001>, <00011>, <OO11I>, and <0/111> have not been 
included in Figure 3-4 since their associated primary pieces do not have 
distinguished nodes. 


3.3.3 Insertion of the Edges 


As each necklace is broken up into at most four contiguous pieces in the 
‘modified grid (the basic piece may have been broken up into two contiguous 
pieces), the shuffle edges can be inserted with the addition of at most O(N/logN) 
vertical and horizontal tracks. In what follows, we will show that at most 
O(NZlogN) vertical and horizontal tracks are needed to insert all of the exchange 
edges as well. Thus the area of the layout will be O(N?/log?N), which is optimal. 


As before, we divide the analysis of the exchange edges into two parts. We first 
‘show that at most O(N/logN) exchange edges link nodes which are in different . 
‘rows of the grid. Such edges can thus be trivially inserted with the addition of at 
most O(N/logN) vertical and horizontal tracks. We then show that those exchange 
‘edges which link two nodes in the same row can be inserted with the addition of 
only O(N/logN) horizontal tracks. The arguments will be very similar to those in 
section 3.2.2. 


(a) Exchange Edges Which Link Nodes in Different Rows 


__ Consider an exchange edge which links two nodes which are in different rows of 
-the grid.- Since only primary and-secondary nodes have been relocated, we can 
conclude from the arguments of section 3.2.2a that the even node which is incident 
‘to the edge is either a primary or secondary node. In what follows, we will show 
that the even node is, in fact, a primary node. 


Assume for the purposes of contradiction that the even node is a secondary 
node. Then this node can be represented as wOw' where w0w' is the distinguished 
node of the secondary piece of <wOw'> and |wj=/ for some i By definition, 
w0w' is located in the ith row of the grid and is linked to wiw' via the exchange 
edge. Since wiw' is odd, it is contained in the basic piece of <w/w'>d. By 
assumption, w/w' is not also in the ith row and thus w/w' cannot be the 
distinguished node of <w/w'>. Since the Iengths of the two blocks of zeros in 

~ wiw' created by switching the ith bit from 0 to / are less than the length of the 
primary biock of zeros (in fact, the sum of their lengths is precisely one less than 
the length of the primary block), w/w' will be the distinguished node cf <w/w'> 
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precisely when w0w'is the node distinguished in <wOw'> by ignoring the 
secondary block of zeros. By definition, this is the case precisely when w0w' is the 
distinguished node of the secondary piece of <w0w'>. By assumption, w0w' is the 
distinguished node of the secondary piece of <wOw'> and thus we can conclude 
that w/w' is the distinguished node of <w/w'>, a contradiction. : 


Next consider a primary node which is incident to an exchange edge linking two - 
nodes in different rows of the grid. By the preceding arguments, this node must be 
ty I . 
of the form wl0.-..- 00>. = Olw' where w/0--. Olw' is the distinguished 
node of the primary piece of <w/0.-- Olw'> and either ¢, or fy is larger than or 
equal 2 the nee of the eae block of zeros in wilw'. Otherwise, 


TOO - Olw' would (by ia is the disineuished node of 


<wlb. ot sa 167 Fry > and thus w10. Te would be on the same 
ty 1) 


TOW as ee, 000. < Ol w' , acontradiction. Each necklace contains at most 
2r such primary nodes where r is the difference between the lengths of the longest 
and second longest block of zeros in any string of the necklace. By Lemma 2-3, we 
can conclude that there are at most O(N/logN) such primary nodes in the entire’ 
shuffle-exchange graph. Thus, at most O(N//ogN) exchange edges link nodes 
which are in different rows. 


(b) Exchange Edges-Which Link Nodes in the Same Row 


Using the analysis developed in section 3.2.2b, it is not difficult to show that at 
most O(N/logN) horizontal tracks are needed to insert the exchange edges which 
link two nodes that are in the same row. In particular, there are still only 
O(N/logN) odd nodes in the top /ogk rows of the grid and thus at most O(N//ogN) 
exchange edges are contained in the top /ogk rows. These can be trivially inserted 
with the addition of just O(N//ogN) horizontal tracks. 


Again following the methods of section 3.2.2b, it is not difficult to show that two 
exchange edges overlap on the ith row only if the first / bits of the associated nodes 
are identical. Thus at most 2* tracks are needed to insert all of the exchange 
edges in the ith row for all D/ogk. Summing, we can again conclude that at most 
O(N/ZlogN) additional horizontal tracks are needed to insert the remaining 
exchange edges. 
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3.3.4 Comments | - 


The methods developed in this chapter can be used to find several other optimal 
layouts for the shuffle-exchange graph. The key variant is the method by which a 
node is distinguished. In particular, this method must be impervious to small 
alterations in the necklace. (This is so that most exchange edges will link nodes 
which are in the same row of the grid.) Only by changing the value of a bit in a- 
small segment of the necklace (such as in the primary or secondary block of zeros) 
should we be able to globally change the distinguished node. 


Another method of distinguishing a node is to select that node in the necklace ~ 
which has the minimal value. Although the proof is very difficult, it can be shown 
that the layout for the N-node shuffle-exchange graph constructed in this manner 
has at most O(N’/log?N) area. In the following section we will desribe additional 
methods of distinguishing nodes. 


At this point, we should also note that the layout just described is not known to 
‘have optimal maximum edge length. In Part II of the thesis, we show that every 
layout of the N-node shuffle-exchange graph must have some edge of length at 
least 2(N/log2N). All the layouts we have considered thus far contain wires of 
length O(N/logN). 


3.4 Layouts With Additional Edges ‘ 


For some applications (such as the calculation of the discrete Fourier transform), 
it is useful to consider networks which have more than just shuffle and exchange 
edges. In particular, we will be interested in layouts for the shuffle-exchange graph 
which also include shift, reverse and transpose edges. In what follows, we will 
show how to modify the optimal layout for the shuffle-exchange graph so that 
these additional edges can be inserted without increasing the total area by more 
than a constant factor. 


3.4.1 Shift Edges 


Shift edges link the ith node to the (i+ J)st node for all odd ii When combined 
with the exchange edges, the resulting network will have links between the ith and 
the (/+ /)st nodes for all i. The inclusion of such edges facilitates the computation 
of discrete Fourier transforms at sequential intervals of a continuous signal. In 


such applications, the input data contained in the ith processor is shifted to the 
(i+ 1)st processor for each i after each computation of a discrete Fourier transform. 
The graph consisting of shuffle, exchange and shift edges is known as the shuffle- 
shift graph. 


Using the methods developed in section 3.3, it is not difficult to show that the 
N-node shuffle-exchange graph can be laid out using only O(N2/log?N) area. As 
before, the necklaces are broken into two or three pieces and placed in a grid 
according to the value of the associated distinguished node. Thus the shuffle edges 
can be inserted as before using only O(N/logN) vertical and horizontal tracks. 


For most odd nodes, adding a / to the value of the node changes only a 
relatively small number of bits at the end of the string. Thus it can be shown that 
at most O(N//ogN) shift edges link nodes which are in different rows. These can 
be easily inserted using only O(N//ogN) vertical and horizontal tracks. Of those 
edges which link nodes in the same row, at most O(N/logN) are contained in the 
first logk rows. For Dlogk, at most 2k shift edges overlap at any point of the ith 
row. By introducing an extra vertical track for each necklace piece, it is possible to 
separate the layout of the shift edges on each level from that of the exchange 
edges. Thus both can be inserted simultancously in the ith row using only O(2k-4 
total horizontal tracks. By the arguments of section 3.3, this means that at most 
O(N/logN) additional horizontal tracks are needed to embed all of the remalung 
shift and exchange edges, thus completing the argument. 


3.4.2 Reverse Edges 


Reverse edges link pairs of nodes that are associated with binary strings which 
are reverses of each other. For example, a,.)---dg is linked to ag---a,., viaa 
reverse edge. Since the algorithm which computes discrete Fourier transforms on 
the shuffle-exchange network leaves the output for node aj.,;---ag in node 
dg+++Qy.;, Teverse edges provide a fast and convenient way of straightening out 
the solution. The graph consisting of shuffle, exchange, shift and reverse edges will 
be referrred to as the shuffle-shift-reverse graph. 


_ Using the techniques developed in section 3.3, it is also possible to show that the 
N-node shuffle-shift-reverse graph can be laid out in O(N7/log2N) area. The basic 
idea is to modify the layout described in section 3.4.1 so that 
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1) pieces of necklaces which are reverses of each other are paired together in 
the left-to-right ordering, and 


2) pieces of necklaces are folded in half. 


The first constraint insures that the maximal overlaps of the reverse edges in 
each row will be small while the second constraint insures that most reverse edges 
link nodes which are in the same row. Although it is not immediately obvious, it 
can be checked that these modifications do not substantially change the procedure 
for inserting the shuffle, shift and exchange edges which was described in section 
3.4.1. Thus all of the edges can be inserted using at most O(N//ogN) vertical and 
horizontal tracks. 


3.4.3 Transpose Edges 


Transpose edges link the ith node to the (N-1-/)th node for each i. Viewed in 
terms of binary strings, transpose edges link each node to its complement. 
Although we do not know of any specific applications of transpose edges, they 
would be useful for problems that require frequent transposition of the data. 


Ry further modifying the optimal layout for the shuffle-shift-reverse graph, it is 
possible to add transpose edges without increasing the total area by more than a 
constant factor. In particular, the layout should be modified so that 


1) pieces of necklaces which are complements of each other are paired together 
in the left-to-right ordering, and 


2) the distinguished node is selected on the basis of the location of the longest 
block of consecutive identical.-bits (be they zeros or ones), 


The first constraint insures that the maximal overlaps of the transpose edges in 
each row are small while the second constraint insures that most transpose edges 
link nodes which are on the same row. Although we do not present the details 
here, it is possible to show that such a layout can be constructed using only 
O(N7/log?N) area, the least possible. 
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~ Appendix: Proofs of Lemmas 3-1 Through 3-4 


We now present the proofs of Lemmas 3-1 through 3-4. Such results can also be 
found in the recent work of Guibas and Odlyzko [GO81la,GO81b]. We are deeply 
indebted to Kleitman for suggesting the proof of Theorem 3-1. 


In what follows, we will write ¥ 0 to denote the number of k-bit strings . 
which do not contain t-J consecutive zeros. Except for the string of all zeros 
(which we ignore), these are precisely the strings which do not contain the 

F . 


tC-— 
substring v,= /0-..0. The proofs of Lemmas 3-1 through 3-4 depend heavily 
on the following combinatorial result. 


Theorem 3-1: For large t and k, 
| FE) = eke" ORI"), 


Proof: We first count the number ¥,'() of k-bit strings which do not contain 
an occurrence of v, between the beginning and end of the string (i.e., for the time 
being we ignore the occurrences of v, which begin at the end and end at the | 
beginning of a string). 


Fix ¢ and let Jf; denote the number of ‘bit strings ending with v, but which do . 
: oo. 
not contain any other occurrences of v, in the string. Set F(x) = Difx!. Note 
— c ane 
that ¥,'(d is the (k+ dth coefficient of F{x). Let J; y denote the number of bit 
strings ending in v, which contain precisely j occurrences of v, and set 
of 
My = SPx. 

Since occurrences of v, cannot overlap, it is not difficult to show that Fx) iS 
identical to F(x) J for all Jol. 

Let g; be the number of bit strings which end in v, (regardless of the number of 


other occurrences of v, which appear in the string) and set G(x) = ¥ gx! . Since 
° . 80 
g=2'' for all i> 1, it is easily seen that G{x) = x(J-2x). Also note that 


ax = SPQ 


Jj?! 
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(—— 


= Av! 


jr 


= [17(-Fxy)] -1 
and thus that 
Fx) = Gx)/(GX) + J) 
_ xE/(1 - 2x + x4. 


Thus ¥( is simply the kth coefficient of 1/7 (J - 2x+ x‘). For example, 
¥,'(D=5 which is the coefficient of x4 in the expansion of. 1 / (1 - 2x+x?). 


Let p(x) = 1 - 2x+x!. Itis easily observed that gcd(p(x), dxxdx) = 1 and 
thus that p(x) does not have any multiple roots for +> 2. Thus we can expand 


t 
Ax! = > A;/ (x-r) 
_ where {r;|1 < i< Q is the set of distinct (and possibly complex) roots of p(x) and 
4; = [rr¥p)|, 
= 1/[dpxVdx, 


for 1< i<1t. Once the roots of p(x) are known, we can calculate ¥,'°0 from 
the formula . - 


t 
¥,'() = 2 Ait . 
é= 


Although we do not know how to find the roots of p(x) explicitly for large t, we 
can describe them asymptotically. First observe that as +090, the absolute value 
of every root must approach either //2 or J. Otherwise the absolute value of one 
term of p{x) will dominate the sum of the absolute values of the other two terms. 
For example, if | <c< 1/2 as t-00 for some root r and constant c, then 
1 > |24+4|r] for large ¢. . . 


If there are to be any roots r such that |7-//2, it is essential that r-1/2. 
Otherwise, the real part of p({r) cannot vanish for large ¢. - By substituting 
(1/2)e(9 for r where s(1)-0 as 1-00, we find that 
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1-M)+7'ekD = 0 
and thus that 
1-(+ 2) + O((0))) + 2! + Ot) = O 


Thus sf) = 2'+q2) where |i] << 2! as t-+00, Another iteration of this — 
process reveals that g(2)=O(1224) and thus that 


- 2 
r= (1/2) & A2") as pro. 


In fact, there is precisely one root, say r, , which approaches 1/2 as I-+©, 
The absolute values of the remaining roots approach /. In particular, the absolute 
values of these roots must be greater than or equal to / for large . Otherwise there 
would be a root r and a function e(f)-07 such that |7A=J-e() . But then 


‘27 2- 2e(d 


Vv 


1+ {1 - e(d|! 
1+ Ir 


for. D2 and it would be impossible for p(r) to vanish for large 7, a contradiction. 


It remains to compute the A;. Since dp(xVdx = uxt! - 2, we find that 
A, = -(1/2)+O(t2) and that. A; = O(//) for 2<i<t. Thus 


Fp) = OD) ~ 1/2 + OMT] 2+! elk+ DZ O(K2”) 
Replacing 1+O0(:2 with ¢(2) and simplifying, we conclude that 
¥,'(0 = 2 ek2™ 0012", kt?" 
for large t and k. 


The only strings which are included in the count of ¥,'(0 but not in that of 
i t-1 : 
W (2) are. those of the form 0-.-Ow/0.. .0 where 1 <i<t-/ and wisa string 


which is included in the count of ¥,.,'() . Thus 


FO = ¥0 - (ODF EO 
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2k ek?! O(2", 2) - (p= I) Qket o(ke2" 0(12" 124 


2k erk2 C12", 1274 

for large ¢ and k.” This completes the proof of the theorem O 
We can now prove Lane a and 3-2. 
Proof of Lemma 3-1: From the definition, we know that 


VQ) = ¥,t+2 - V+) 


- “ty 2 (+1 “ty ed 
=k hI? 0002 aD) ok gk ZO? (ea, ka) 


for large tand k, For ¢ > (logk/2+loglogk, both t2' and kt22! vanish as 


k--co, In what follows, we will show that if ¢<< kK, then 
(tt) - 1 
ghZ? ot  >y O(t2"!, kt726) 


and thus that 


-(t+ 2) -(t+ 1) 


) 


Assume for the purposes of contradiction that 


-(t+2) -(t+ 1) 

ek2 - ek2 O12", kt224) . 
(t+). (+), 42), ,9ft+) 

Then, eX? ~ eke which means that ek2 ~~ +k2 ~ 1] and 

thus that A2+9 1g, Thus we can use a Taylor series expansion of the 


exponentials to find that 
gare?) : ekzer) . (1 = Katt . (= Katt D) 
kL Ky t2) 
>> O12", kt7*4) 
provided that «<< k, a contradiction O 


Proof of Lemma 3-2: The number of k-bit strings which do not contain a block 
of logk - logink - 1 consecutive zeros is 
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-logk + logink 
2k erk2 


Y (logk - logink) 


Yk 


O(N/logN) . 


The number of k-bit strings which contain a block of 2/ogk +1 consecutive zeros. 


2k - F,(2logk+2) ~ 2k - 2k gk? O((logkVe’) 

os 2k ~ QKED - 14k) + O((logkVkA)] 
~ 2k/4k 
= O(N/logN)O 


_ The proofs of Lemmas 3-3 and 3-4 depend on the following corollary to 
Theorem 3-1, . 


Corollary 3-1: For bounded m and p and large k and 1 
Bre 
Z¥iemef9 = OVEN . 
Proof: : We first observe that for ¢ < 2logk/3 , 


Vemepd < ¥;{2logk/3) 
2 k e k gy Zlogky3 


2) 
2k ek 


i 


and thus | that 
\ 


pm ; 173 
Vem l) S (2/3) logk kek 
te 
<< 2ky~Rm 
for any finite m and p as k-oo , 


For larger values of 4 
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7 ~  okemttp kd! 
V emi+ pO aon e 


and thus ; 
kee Kee 
SF (2) 3 ok-mt+ p kz" 
7 yj ~ e - 
t:tlesk k-mt+p us “as 


3 
By making the change of variables r = t- Jogk , we can see that the preceding» 
sum is at most 


pao -r 
( Dk+p / kM) >, zyimr @2 
rr) ' 


Yeo 


and thus at most O(2*“k™") = O(N/logN) O 
Proof of Lemma 3-3: A string whose longest block of zeros has length t and 
tt 


whose second longest block of zeros has length s<z is of the form wld. - -0w', 
where the longest block of zeros in ww' has length s. By definition, there are at 
‘most k¥,_,.,(s) such strings. Thus the sum over all necklaces of the difference 
between the sizes of the longest block and second longest block of zeros is at most 


x ¢t ; 
< (Ib 2 2a (1-3) k¥, AS 


K ¢ = - 
2 2 (t-3) [Vy 842) - Yep StI 


== > (2k ek2* 0(s2", ks25) > 21 eld") 
S*i tes 
< > (2k gk2* (0(s2", ks) 7s 60(52")) 
ss 
k 
= aks gh2* (52%, ks2) 
$1 
K 
< 2 ¥;-£5) 
= O(N/logN) 


by Corollary 3-1 0 
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Proof of Lemma 3-4: Consider a necklace which fails to have a uniquely, 
distinguished node. Each node in such a necklace must have one of the following 
three forms: 


]) "Ro Bi 0-- -Qw3Q:- Ins, 
2) w/0- CP TIL.. ooMs or 


r v 
een F ine 
3) w/0- : -Qw20: --Ow-- Ow: --Qw 
¢ t 4 ¢ 2 
where ¢ is the length of the longest block of zeros in any of the strings. It is easily 
seen that there are at most 


wiz 
1) k 2 Wy.9{t+2) nodes of the first type, 
us : 
2) k? 2 V,.3{t+2) nodes of the second type and 
n 


; «ly _ 
3) PD WV ,.4{1+2) nodes of the third type. 


By Corollary 3-1, we can thus conclude that there are at most O(N/logN) such 
nodes altogether O 
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CHAPTER 4 


PRACTICAL LAYOUTS 


Although the O(N2/log?N)-area layout for the shuffle-exchange graph described 
in Chapter 3 is (up to a constant) asymptotically optimal, it is not optimal for small 
values of N (e.g., N=128). In fact, none of the general layout procedures thus far 
discussed provide good layouts for small shuffle-exchange graphs. For practical 
applications, however, these are precisely the shuffle-exchange graphs for which we 
need good layouts. 


In this chapter, we descibe techniques for finding good layouts for small shuffle- 
exchange graphs. Although the techniques (which are described in section 4.2) do 
not yet constitute a general procedure for finding truly optimal layouts for all 
shuffle-exchange graphs, they can be used to find “very nice" layouts for "small" 
shuffle-exchange graphs. As examples, we have included layouts for the 8-node, 
16-node, 32-node, 64-node and /28-node shuffle-exchange graphs in section 4.3. 
The layouts ‘are "very nice" in the sense that: 


1) they require much less area than previously discovered layouts, 


2) they have a certain natural structure which facilitates efficient layout 
description, chip manufacture and I/O management, and 


3) they require the minimal amount of area for layouts with such structure. 


4.1 Preliminaries 


We have chosen to use the Thompson grid model [T80] to illustrate our 
techniques because of its widespread acceptance and its simplicity. For practical 
layouts, however, the assumption that processors can be represented by points is 
clearly false. Nontheless, we show in section 4.1.1 that good Thompson model 
layouts can still be used to find good practical layouts. Thus we will be able to rest 
assured that the Thompson model is, in fact, an acceptable means for describing 
practical layouts of the shuffle-exchange graph. 


We must also be sure that the /ayouts we design can be effectively used in 
practice. For example, it is important that the layouts have a suitable input/output 
structure so that data can be put on and taken off the chip efficiently. In section 
4.1.2, we describe a general class of layouts for the shuffle-exchange graph which 
appear to satisfy such constraints. The remainder of the chapter will then be 
devoted to finding optimal layouts within this class, 


4.1.1 A Closer Look at the Thompson Model 


The manner in which the Thompson model is useful for describing practical 
layouts varies with the size of the processors involved. For example, if one desires 
to use the shuffle-exchange graph as a permuter, then each processor need only 
contain k storage registers and some I/O hardware. Such a processor can be easily 
hardwired in a kxk square. In order to achieve maximum parallelism, each wire of 
the Thompson model layout is reproduced & times so that an entire k-bit word can 
be transmitted in one time step. For example, the optimal 2x6 Thompson model 
layout for the 8-node shuffle-exchange graph (which is shown in Figure 4-3 in 
section 4,3) can be transformed into the more realistic 6x/8 layout shown in Figure 
4-1 by tripling the grid lines and replacing the point processors by 3x3 boxes (into 
which the guts of each processor can later be wired). 


Figure 4-1: A transformed Thompson model layout 
Sor the 8-node shuffle-exchange graph. 


For some applications, the processors themselves require an entire chip. For 
example, every processor of a shuffle-exchange graph used to compute discrete 
Fourier transforms must be equipped with a floating point multiplier. Using the 
best technology currently available, only a few floating point multipliers can be 
wired onto a single chip. In this case, a Thompson model layout can be used to 
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design an efficient /ayout of chips where each chip contains a single processor. 
(Such a device is currently under development at IBM.) The wires, as before, are 
replicated to achieve maximum parallelism but now serve as links between chips. . 
Since the wires must be much wider in such a device, the side length of a processor 
(the chip) is about the same as the combined width of all the wires (pins) attached 
to it. By following an expansion procedure similar to the one described in the 
previous example, a good Thompson model layout can thus be used to design a 
good practical layout. 


41.2 <A Class of Practical Layouts 


In this chapter, we will consider layouts for the shuffle-exchange graph for 
which: 


1) each necklace appears as a rectangle consisiting of arbitrarily long segments 
of two vertical tracks and unit length segments of two horizontal tracks, 


2) the horizontal tracks are divided into pairs, each pair containing at most one 
full necklace and any number of degenerate necklaces, and 


3). each exchange edge appears as a horizontal line segment. 
For example, the layouts described in Chapter 2 have this form. 


Such layouts are particularly well suited for practical implementation since their 
structure facilitates efficient description, chip manufacture and data management. 
For example, by attaching a pin to each of the @(N//ogN) necklaces (this is 
feasible for small NV), it is possible to load N input values into an N-processor 
shuffle-exchange chip in just O(/ogN) steps. 


Even more importantly, we will show in the following section how to find 
layouts with the above form which require very small amounts of area. Thus very 
little is lost by restricting our attention to such layouts. 


4.2 Optimization Techniques 


In this section, we explain how to find layouts for small shuffle-exchange graphs 
which are optimal up to the constraints described in section 4.1.2. For the most 
part, our methods are comprised of common sense, heuristics and exhaustive 
scarches. 
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4.2.1 Ordering the Necklaces 


The first step in finding optimal layouts of the form described in section 4.1.2 is 
to order the necklaces from left to right so that the number of exchange edges 
which overlap at each point of the ordering is kept small. More precisely, we wish 
to find an ordering of the necklaces for which the maximum number of exchange 
edges overlapping at any point is minimized. For example, no more than 6 | 
exchange edges overlap at any point of the ordering used to produce the layout for 
the 32-node shuffle-exchange graph shown in Figure 4-2. If we switched the 
necklace <5> with <JJ>, however, 9 exchange edges would overlap in the gap 
between <7> and <5>. Since the maximum overlap is a lower bound on the 
number of horizontal tracks necessary to insert the exchange edges, we can easily 
see that the latter ordering is inferior since any layout it produces must have at 
least 9 horizontal tracks. Note that the layout in Figure 4-2 has just 6 horizontal 
tracks. 


Figure 4-2: A good ordering of the necklaces 
Sor the 32-node shuffle-exchange graph. 


As we mentioned in Chapter 3, it is not known how best to order the necklaces 
in general. For small shuffle-exchange graphs, however, there are several simple 
heuristics which produce optimal orderings. For example, arrangements of the 
necklaces from left to right in order of nondecreasing size or, alternatively, in order 
of increasing minimal number represented are usually quite close to optimal for 
small shuffle-exchange graphs. In fact, such orderings are within a necklace swap 
of optimal for N¢256 (k<8). Note the the ordering displayed in Figure 4-2 could 
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have been produced by either of these methods. 


Probably the most difficult task is proving that a good ordering is, in fact, 
optimal. The techniques we have used to prove optimality depend heavily on 
exhaustive searches. For k<8, the techniques have suceeded in proving the 
optimality of good orderings. For 9<k<J3, we have found good orderings but 
have been unable to prove that they are optimal. We have summarized the results 
in Table 4-1. Note that for each k, the maximum overlap of the best known 
ordering serves only as a /Jower bound for the number of horizontal tracks that will 
be required for any layout with that ordering. In some cases, additional horizontal 
tracks may be required. 


Table 4-1 


Maximum Overlap of Best Known Orderings 


maximum overlap of 


k N best known ordering — optimal? 
3 8 2 yes 
4° 16 3 yes 
5 32 6 yes 
6 64 10 yes 
7 128 . 18 yes 
8 256 330 yes 
9 512 62 ? 
10 1024 115 ? 
11 2048 214 ? 
12 4096 388 z 
13 8192 754 ? 


4.2.2 Inserting the Exchange Edges 


The second step in constructing optimal layouts for small shuffle-exchange 
graphs is to insert the exchange edges using as few horizontal tracks as possible. 


Recall that in Chapter 2, we showed how to use the complex plane diagram as one 
method of inserting the exchange edges. Although this method is theoretically 
nice, it is not very practical since it uses an excessive number of horizontal tracks to 
insert the exchange edges. For example, /0 horizontal tracks were used to insert 
the exchange edges in the layout shown in Figure 2-3 whereas only 6 tracks were 


required in the layout shown in Figure 4-2 (even though the same necklace ~ 


orderings were used for both layouts). 


The complex plane diagram can still be of use when inserting exchange edges, 
however. For example, notice that the top-to-bottom orderings of the exchange 
edges across most of the vertical cuts which are located between necklaces in the 
layout in Figure 4-2 are the same as the orderings for the corresponding cuts in 
Figure 2-3. In general, knowledge of the level structure of the complex plane 


diagram is very helpful in optimizing the insertion of the exchange edges. In fact, - 


we relied heavily on such knowledge when constructing the optimal ‘Tayouts 
displayed in section 4.3. 


For very small shuffle-exchange graphs (eg., for k<5), it is possible to find 
optimal embeddings of the exchange edges by trying all reasonable possibilities. 
For somewhat larger shuffle-exchange graphs (e.g., kK=6,7), however, the task is 
substantially more difficult. In order to find the optimal layouts shown in section 
4.3, we 


1) first located the center of the region of maximum overlap and (using the 
complex plane diagram as a guide) inserted the exchange edges which 
crossed the region (one edge on each horizontal track), 


2) next inserted the exchange edges located in neighboring regions without (if 
possible) introducing any additional tracks, and 


3) lastly inserted the remaining exchange edges (again without adding any new 
horizontal tracks). 


Steps 1 and 3 are easy but step 2 can be difficult. In some cases it is necessary to 
interchange the left and right parts of some necklaces or to slide a node around 
from one part of a necklace to the other. For k = 6 and 7, it is also necessary to 
introduce an extra horizontal track at step 2. For larger shuffle-exchange graphs, it 
would probably be necessary to introduce even larger numbers of horizontal tracks. 
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4.2.3 Additional Savings 


All of the practical layouts we have considered thus far have two horizontal 
tracks which are used solely for the purpose of connecting the left part of each 
necklace to the right part. It is not difficult to show that these tracks can be 
eliminated without affecting the rest of the layout. As an example of how this can 
be accomplished, we suggest that the reader compare the layout of the 32-node - 
shuffle-exchange graph shown in Figure 4-2 with that in Figure 4-5. 


Even larger savings can be had for some shuffle-exchange graphs by doubling 
up the degenerate necklaces with full necklaces in the same pair of vertical tracks, 
thus reducing the number of vertical tracks used. Of course, it is necessary to 
rearrange the exchange edges somewhat but, as degenerate necklaces have very few 
nodes in small shuffle-exchange graphs, this can usually be done without 
introducing any additional horizontal tracks. For example, substantial savings can 
be achieved in this manner for the /6-node and 64-node shuffle-exchange graphs. 


4.3. Optimal Layouts 


In the following figures, we exhibit layouts for the 8-node, /6-node, 32-node, 64- 
node and /28-node shuffle-exchange graphs which are optimal up to the 
constraints described in section 4.1.2. The layouts were found via the techniques 
described in section 4.2. 
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Figure 4-3: A 2x6 layout for the 8-node shuffle-exchange graph. 
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Figure 4-4: A 3x8 layout for the 16-node shuffle-exchange graph. 
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Figure 4-5: A 6x14 layout for the 32-node shuffle-exchange graph. 


Figure 4-6: An 11x18 layout for the 64-node shuffle-exchange graph. 
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Figure 4-7: A 19x36 layout for the 128-node shuffle-exchange graph. 


4.4 Other Layouts 


To this point, we have considered only a specific class of layouts for the shuffle- 
exchange graph. As these layouts are quite good, it is not clear that we need to 
consider others. Nevertheless, it is worth pointing out that slightly better layouts 
do exist for some shuffle-exchange graphs. For example, by considering layouts in 
which the exchange edges are allowed to bend and in which two or more full 
necklaces can occupy the same pair of vertical tracks, it is possible to construct the 
layout for the 32-node shuffle-exchange graph shown in Figure 4-8. 


_ Figure 4-8: An improved 7x9 layout for the 32-node shuffle-exchange graph. 


It is likely that slight improvements can also be made for larger shuffle-exchange 
graphs. At this point, however, we feel that research efforts should be directed 
more towards implementation of the good layouts already discovered. Once this is 
done, it will be much clearer whether or not the effort necessary to further reduce 
the layout area is justified. 


49 


This empty page was substituted for a 
blank page tn the original document. 


PART II 


LOWER BOUND TECHNIQUES FOR VLSI 


CHAPTER 5 


REVIEW OF KNOWN TECHNIQUES 


In this chapter, we review the known techniques for determining the layout area 
and maximum edge length of an arbitrary VLSI network. We also preview the 
results we will prove in Chapters 6 through 8 of the thesis. A comparison of our 
lower bounds with the previously known upper and lower bounds can be found in 
Tables 5-2 and 5-4. 


5.1 Area Bounds 


One of the most important problems in the theory of VLSI is the determination 
of the minimum amount of area required to lay out a network on a chip. Given an 
arbitrary graph, this problem has two parts; namely, 


1) finding a good layout for the graph, and 
2 showing that the layout is optimal. 


There are. a variety of techniques known for finding good layouts for specific 
graphs [MR79, PV79, S79, HL80, MC80, PV80, SR80b, T80, BL81, KLLM81, 
LLM8], LM81, PRS81, T81], but the only known general technique is due to 
Leiserson [L80a,L80b]. In particular, he showed how to construct a good layout for 
any graph for which a good separator is known. (An N-node graph is said to have 
an f(N)-separator if it can be partitioned into two equal-sized subgraphs G, and G, 
such that at most AN) edges link G, to G, and both G, and G, have AN/2)- 
separators.) We have summarized Leiserson’s results in Table 5-1. 


There are two difficulties with Leiserson’s method. First, it is not always 
possible to find a good separator for a graph. For instance, a minimal O(N/logN)- 
separator was not found for the shuffle-exchange graph until after an optimal 
O(N2/log2N)-area layout was discovered. Secondly, the layouts produced by 
Leiserson’s technique are not always optimal — even if a minimal separator is 
known. For example, Leiserson’s technique requires O(N/ogN) area to lay out the 
N-node mesh, substantially more than is really needed. For the most part 
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Table 5-1 


Upper Bounds on the Layout Area of 
N-Node Graphs With Specified Separators 


upper bound 
separator - on layout area 
-N%a<I/2 © N 
N®, a=1/2 a Nlog’N 
N®, a> 1/2 N@a 


however, Leiserson’s method is a good one and certainly the most general 
technique currently available. 


Once a good layout for a network has been found, it remains to show that the 
Jayout is optimal. This is accomplished by proving a good /ower bound on the 
layout area of the network. The only known methods for proving such lower 
bounds are due to Thompson [T79,T80], Vuillemin [V80] and Lipton and 
Sedgewick [LS81]. They have concentrated on the related problem of proving 
lower bounds for the bisection width of a graph. (The bisection width of a graph is 
the minimum number of edges which must be removed in order to separate the 
graph into two disjoint and equal-sized subgraphs.) 


Thompson was the first to notice the relationship between bisection width and 
layout area. In particular, he showed that the wire area of a graph with bisection 
width b is at least Q(b?). In what follows, we. prove the slightly weaker (and 
simpler) result for layout area. . 


Theorem 5-1 (Thompson [179]): The layout area of a graph with bisection width | 
b is at least Q(b’). 


Proof: Consider an optimal layout of a graph G with bisection width b. Cut the 
layout horizontally so that precisely //2 of the nodes of G are above the cut. (For 
an example, see the diagram in Figure 5-1). Since at least b edges must cross the 
cut, the layout must contain at least b-/ vertical tracks. A similar argument 
reveals that the layout must also have at least &-/ horizontal tracks. Thus the area 
of the layout is at least (b-/)? = Q(6*) Oo 
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Figure 5-1: A horizontal bisection of a layout. 


Although the task of finding a good lower bound on the bisection width of a 
graph is difficult in general, Thompson [T79] was succesful in finding good 
bisection width lower bounds for a variety of computationally useful networks. — 
‘For example, he used information transfer arguments to show that any network — 
which is capable of computing the discrete Fourier transform on N elements in T 
steps must have bisection width at least 2(N/7). Among other things, he was thus 
able to conclude that at least 2(N2/log2N) area is required to lay out the NV-node 
shuffle-exchange graph. 


Thompson’s work has recently been extended; first by Vuillemin [V80] and then 
by Lipton and Sedgewick [LS81]. Vuillemin characterized a broad class of graphs - 
for which Thompson’s lower bound arguments can be applied while Lipton and 
Sedgewick showed how to use crossing sequence arguments to prove lower bounds 
for an even larger class of graphs. 


Although the methods of Thompson, Vuillemin, Lipton and Sedgewick are quite 
elegant and useful in establishing good bisection width lower bounds for certain 
graphs, their applicability is inherently limited to graphs for which the layout area 
is no more than a constant times as large as the square of the bisection width. 
Thus they have not been of use in resolving two of the key open questions in VLSI 
theory; namely, 


1) “How much area is needed to lay out a planar graph?" and 


2) "How much area is needed to Jay out a graph which has an O(N//2)- 
separator?,"” 
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The planar graph question is particularly important since, as we will show in 
Chapter 7, the layout problem of an arbitrary graph can be reduced to that for a 
planar graph. No nontrivial lower bounds have been found for either problem, 
however. As we mentioned previously, the best procedure known requires 
O(Niog?N) area to lay out an arbitrary N-node graph with an O(N/”2)-separator. 
As Lipton and Tarjan [LT77] have shown that every N-node planar graph has an 


O(N!”2)-separator, the O(N/og?N)-area layout procedure also works for planar 


graphs. Although it is suspected that better layout procedures exist for planar 
graphs, none have yet been found. 


In the thesis, we pursue an entirely different strategy in developing new lower 
bound techniques for VLSI. Whereas previous researchers have been concerned 
primarily with the bisection width of a network, we shall be concerned with its 
crossing number and wire area. Both are lower bounds on the layout area of any 
graph. In fact, we will show in Chapter -7 that 


Q(b) < ct+N<weA 


for any N-node graph with bisection width 5, posing number c, wire area w and 
layout. area A. 


The preceding inequality implies that every lower bound technique for the 
bisection width of a graph is also a lower bound technique for its crossing number 
and wire area. Thus nothing is lost by forgetting about bisection width and 
concentrating ones efforts on finding good lower bounds for the crossing number 
and wire area of a graph. In fact, much can be gained. For example, we will use 
such techniques to find 


1) an N-node planar graph which has layout area Q@(NlogN), and 


2) an N-node (nonplanar) graph with an O(N/””)-separator which has layout 
area O(Niog?N). 


The first result demonstrates that not all planar graphs can be laid out in linear 
area, thus disproving a conjecture thought by many to be true. The second result 
indicates that Leiserson’s O(Nlog?N)-area layout technique for graphs with 
O(N/””)-separators is optimal at least some of the time and thus cannot, in general, 
be improved. 


For easy reference, we have summarized our results.along with the previously 
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known upper and lower bounds in the following table. The upper bounds are due 
to Leiserson [L80a] and represent the maximal amount of area needed to lay out 
any graph with the designated property. The lower bounds, on the other hand, 
represent the minimal amount of area required to lay out a specific class of graphs 
with the designated property. The previously known lower bounds are, for the 
most part, trivial. The only exception is the N2@ bound which, as a corollary of 
Theorem 5-1, is due to Thompson [T79]}. 


Table 5-2 
Area Bounds 
previous our upper 
separator lower bound lower bound bound 
N®,a¢I/2 _N N 
N®, a= 1/2 N Nlog’N Nlog?N 
N®, a> 1/2 N20 N2a 


(planar) ° N NlogN Mlog?N 


5.2 Edge Length Bounds 


There has been a great deal of interest lately in the problem of minimizing the 
length of the longest wire in VLSI layouts [BL81,CM81,PRS8]]. It is not difficult 
to show that the length of the longest wire in any reasonable, area-optimal VLSI 
layout is at most a constant times the. square root of the layout area. (Otherwise, 
some wire would be longer than the perimeter of the layout, which is 
unreasonable.) Bhatt and Leiserson [BL81] recently found better layouts for graphs 
with small separators. We have summarized their results in Table 5-3. (For 
completeness, we have also included the trivial bound for graphs with large 
separators.) 


It is worth noting that the layouts which achieve the bounds in Table 5-3 
simultaneously achieve the best known bounds for layout area. Thus no /ayout 
area/maximum edge length tradeoffs are apparent. 
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Table 5-3 


Upper Bounds on the Maximum Edge Length of 
N-Node Graphs With Specified Separators 


upper bound. on 


separator maximum edge length 
N*, a < 1/2 N!/2/logN 
N&, a=1/2  N!72IogN/loglogN 
N®, a> 1/2 , Ne 


Very little has been accomplished in the way of lower bounds, however, since 
bisection’ width arguments do not seem to be applicable to edge length 
considerations. In fact, the only known lower bound for maximum edge length is 
the trivial lower bound derived from the diameter of a graph. (The diameter of a 
_ graph is the greatest distance between any pair of nodes in the graph where 
. distance is defined to be the length of the shortest path linking the pair of nodes.) 
The precise, lower bound is stated in the following theorem. 


Theorem 5-2: Any layout of a graph G with diameter d and mets area A has 
some edge of length at least A!/2/3d, 


Proof: Let I be any layout of Gand g be the length of the longest wire in I. 
We will use I’ to construct another layout '' of G which has at most 9d?q? area. 
Since any layout for G has at least A area, this will be sufficient to show that 
q > A!7273d. 


Since every pair of nodes in G is linked by a path of length d or less, we can 
conclude that every pair of nodes are within distance dg of each other in I. 
(Otherwise, some edge would have length greater than g in I, a contradiction.) 
Thus, all of the nodes are contained in some dq x dq square in I’. Since every 
wire which leaves the square must re-enter at some other point, we can conclude 
that at most 2dq wires can cross the boundary of the square at any point. By 
rewiring the portion of I which is outside the square, it is possible to produce a 
second layout ['' for G which has at most 2dq additional horizontal tracks and 2dq 
additional vertical tracks. (One additional horizontal track and one additional 


56 


vertical track are needed to replace each wire.) Thus the total atea of ['' is at most 
9d*q’. (As an example of how the rewiring should be done, we have included 
Figure 5-2.) 0 


peat (pc hee er ares ~~ “1 boundary 


| 


ey) 


Felice’ al tateeiee 


Figure 5-2: Rewiring the outer portion of a layout. 


It is not difficult to construct N-node graphs with {N)-separators which have 
logN diameter for any fN). By Theorem 5-2, any layout of such a graph must 
have a wire of length Q(NVlogN). Using crossing number and wire area 
arguments, however, we will find examples of graphs which Tust contain even 
longer wires. In particular, we will describe 


1) an N-node planar graph for which any layout must have a wire of length 
e( NiI72 Nog!/2 N), 


2) an N-node graph with an O(N/”*)-separator for which any layout must have 
a wire of length O(N/”2/ogN/loglogN), and 


3) an N-node graph with an O(N/-/”")-separator for which any layout must 
have a wire of length O(N/-/”) for any r>3. 


The latter two results achieve the known upper bounds for maximum wire 
length. They also indicate that some wires in some layouts must be very long 
(possibly as long as the length of the entire layout). . 


For convenience, we have summarized our edge length results along with the 


57 


previously known upper and lower bounds in Table 5-4. The upper bounds are 
due to Bhatt and Leiserson [BL81] while the lower bounds are all easy corollaries 
of Theorem 5-2. 


Table 5-4 


Maximum Edge Length Bounds 


previous our upper 
separator lower bound lower bound bound 
N®, a < 1/2 N!/2/logN N!“/logN 
N®, a=1/2 N!”/logN N!72logN/loglogN N!“2logN/loglogN 
— N® > 1/2 N®/logN Ne Ne 
Giana) N'7/logN N!72/flog!”2N Ni?2 logN/loglogN 
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CHAPTER 6 


NETWORK CONSTRUCTIONS 


In this chapter, we will describe the networks for which we will later establish 
layout area and maximum edge length lower bounds. As the networks are new 
and interesting in their own right, we will discuss each at some length. 


6.1 The 2-Dimensional Mesh of Trees 


The N-node 2-dimensional mesh of trees will be the first example of a graph 
with an O(N’72)-separator known to have layout area O(Wlog?N) and maximum 
edge length O(N//ogN/loglogN). 


6.1.1. Definition 


~The 2-dimensional nxn mesh of trees M> ,, (where n is assumed to be a power of 
2) is defined as follows. Starting with an nxn matrix of nodes and adding nodes 
wherever necessary, construct a complete binary tree in each row and column of 
the matrix. The trees should be constructed so that 


1) the leaves in each tree are precisely the nodes in the corresponding row or 
column of the original matrix, and 


2) the subgraph induced on the nodes in each quadrant is M>,,. . 


For example, we have drawn M, , in Figure 6-1. The nodes in the original 4x4 
matrix are represented by dots. The nodes which were added in order to form row 
trees are drawn as small triangles while those added to form column trees are 
shown as small squares. The row tree edges are drawn with solid lines while 
dashed lines represent edges of column trees. Notice that if we were to remove the 
roots of the row and column trees of M), and the edges incident to them, we 
would be left with 4 copies of Af, , one in each quadrant. In general, if we 
remove the nodes and edges in the top & levels of the binary trees in M),, ,* we 
will be left with 224 copies of M 2nz*- This important property of meshes of trees 
is used extensively throughout Chapters 7 and 8. 
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Figure 6-1: The 4x4 mesh of trees Mo 4 


6.1.2. Properties 
It is not difficult to show that the nxn mesh of trees M),, has 
1) N = 3n?-2n = Q(n2 nodes, 
2) bisection width n = O(N’) , 
3) diameter 4/ogn = O(logN) , and 
4) an O(N’”2)-separator. . 


By applying the methods discussed in Chapter 5, we can thus conclude that the 
N-node 2-dimensional mesh of trees has 


1) crossing number at most O(Nlog?N), 
2) layout area between Q(N) and O(Niog2N), and 


3) maximum edge length between Q(N!”7/JogN) and O(N!logN/loglogN). 


In fact, we will show in Chapters 7 and 8 that the N-node 2-dimensional mesh of 
trees has 


1) crossing number O(NiogN), 
2) layout area @(Niog*N), and 
3) maximum edge length O(N!”logN/loglogN). 


Thus the 2-dimensional mesh of trees is the first graph with an O(N//4) 
separator known to acheive the upper bound for layout area discovered by 
Leiserson [L80a] and the upper bound for maximum edge length discovered by 
Bhatt and Leiserson [BL8]]. 


6.1.3 Applications 


Computationally, the nxn mesh of trees is a very powerful network. Among 
other things, it can be used to 


1) multiply a fixed nxn matrix by m different n-vectors in m+ 2/ogn (word) 
steps, 


2) sort a list of n mrbit words in 2m+Slogn (bit) steps, and 
3) link n input terminals to n output terminals in any order in /ogn (bit) steps. 


The algorithms and processors needed for these operations are quite simple. For 
example, the processors needed for sorting and switching need only contain a few 
and and or gates while those for matrix-vector multiplication need only contain a 
word multiplier or adder. We describe the algorithms needed for these operations 
in the following three subsections. 


(a) Matrix-Vector Multiplication 


Given any fixed nxn matrix S=(s,), we will show how to program M) , to 
compute the product of S and any m input n-vectors in m+2logn (word) steps. 
As S is fixed, it is not considered to be part of the on-line input. Rather, it is 
considered to be part of the program (in the form of off-line input) and thus we 
assume that the value of s,;is initially stored in the (i/) leaf of M>,, for each i and 
j. The algorithm proceeds as follows. 


Given any input vector v=( v) . input the jth entry vj into the root of the jth 
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column tree for each j, /<j<n. Pass the entries of v down the column trees so that 
after /ogn steps, each leaf in the jth column tree has received the value of vj 
Computation of the n? products {s,v; | 1 <i, j< n} can now take place simul- 
taneously. Afterwards, we can find the entries of the product vector Sv by 
summing the values of the leaves in each row tree. This operation takes an 


additional /ogn steps. 


The total running time of the algorithm just described is J+2/logn. By 
pipelining the input vectors through the column trees and the output sums through 
the row trees, it is not difficult to see that m such products can be calculated in 
m+ 2logn _ steps. 


(b) Sorting 


The algorithm for sorting proceeds as follows. Starting at the roots, input (bit by 
bit) the ith word to be sorted into the ith row and column trees for each i, 1<i<n. 
Pass the bits down each tree so that after /ogn steps the leading bit of the ith word 
has reached each leaf of the ith row and column trees. Comparison of the ith and 
jth words for all i and j can now proceed simultaneously. After at most m 
additional steps, the (i,/) leaf has decided whether the ith word is smaller or larger 
than the jth word. Ties are broken arbitrarily (e.g., depending on the values of / 
and j). Once this is done, each leaf transmits a 0 or a 1 to its column tree father 
depending on whether its column tree word was smaller or larger than its row tree 
word. Each column tree then sums these values in order to determine the position 
of its word in the final ordering. (If the sum is carried out bit by bit starting with 
the least significant bit, this process takes 2/ogn steps.) This information is then 
used to mark a path in each column tree from the root to that leaf which is also in 
the appropriate row tree (again taking 2/ogn steps). It is now a simple matter to 
transmit the bits of the ith word along the unique path from the ith column tree 
root to the appropriate row root for each i As the paths are all pairwise disjoint, 
this process takes only m+ 2/ogn steps. 


The algorithm just described sorts a list of n m-bit numbers in 2m+ 7logn steps. 
It is a simple exercise to speed up the alogorithm to obtain the 2m+S5logn step 
bound. We should also point out that this algorithm is similar to the one described 
by Muller and Preparata in [MP75]. The VLSI implementation of the algorithm is 
new, however, and far superior to many of the VLSI sorting algorithms discussed 
by Thompson in his recent survey paper [T8]]. 
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(c) Switching 


Given the algorithm just described for sorting, it is clear how to program M) , to 
serve aS a switching network for n input and output lines. For example, assume 
that the ith input line is to be. connected to the jth output line for some / and j. In 
order to do this, we first hook up the ith input line to the ith column root. We 
next establish a path from the root of the ith cloumn tree to that leaf in the tree 
which is also in the jth row tree. This can be done by inspection of the binary 
representation b;--- digg, of the number j.. More precisely, at the kth level of the 
binary tree, we branch left or right depending on whether 5, is 0 or J 
(respectively). Lastly, we link the appropriate leaf of the jth row tree to the root of 
the jth row tree and then to the jth. output line (again taking /ogn steps). 


The algorithm just described takes 2/ogn steps to link n input lines to n output 
lines in any order. It is not difficult to show that if the row tree connections are 
hardwired in advance (i.e., by linking the root of each row tree to all of its leaves), 
then the input-output connections can be properly made in just Jogn steps. 


6.2 The Dimensional Mesh of Trees 


The N-node rdimensional mesh of trees (for r>2) will be the first example of a 
graph with an O(N%)-separator (for a>J/2) known to have maximum edge length 
O(N). : 


6.2.1 Definition 


The 2-dimensional mesh of trees can be easily generalized to higher dimensions. 
For example, the 3-dimensional nxnxn mesh of trees M3,, can be constructed as 
follows. Starting with an nmxnxn cube of nodes and adding nodes wherever 
necessary, construct a set of n? complete binary trees in each of the three 
dimensions of the cube. As before, the trees should be constructed so that the 
leaves are precisely the nodes of the original cube and so that the subgraph 
induced on each octant of nodes is M;,,.. The general rdimensional mesh of 

r 
trees M,.,, is formed from an nxnx- -- xn hypercube in a similar manner. In 
general, removal of the roots and edges which are in the top level of the binary 
trees will leave 2’ copies of M,,/ . 
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6.2.2 Properties 


= 
It is easily observed that the’r-dimensional fxnx.. xn mesh of trees M, ,, has 
(for bounded r) 


1) N = (r#J)n" - mm”! = O(n) nodes, 
2) bisection width n™! = O@(NTIY) , 

3) diameter 2rlogn = Q(logN) , and 
4) an O(n 4 ")-separator. 


Thus we can easily infer that the N-node rdimensional mesh of trees has (for 
bounded r) 


1) crossing number at most O(N7"N, 

2) layout area O(N2-2”), and 

3) maximum edge length between Q(N!-/“/logN) and O(N’), 
In fact, we will show in Chapter 7 that the graph has 

5. crossing number Ow2-2/ , and 

2 maximum. edge length @(N/-/%), 


Thus the rdimensional mesh of trees is the first graph with an O(N“)-separator 
(for a>J/2) known to achieve the trivial upper bound on maximum edge length. 


6.2.3 Application to Matrix Multiplication 


Computationally, the rdimensional mesh of trees is a very powerful network. 
For example, M,,, can be used to multiply m pairs of nxn matrices in m+ 2logn 
(word) steps. The algorithm is very similar to the one used by M 2n to compute 
matrix-vector products. It procceds as follows. 


At each time step, a pair of matrices is entered into the network via the roots of 
the trees in two of the dimensions (one dimension for each matrix). The entries 
are passed down through the trees so that after /ogn steps, the leaf in the (7,5) 
position of the cube contains the (r,s) entry of the first matrix and the (s,/) entry of 
the second matrix for each r,s and 7. All n? multiplications can then be performed 
simultaneously. The entries of the product matrix are then calculated by summing 
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the values of the leaves of each tree in the third (previously unused) dimension. 
This process takes an additional /ogn steps. As the network is easily pipelined, it is 
clear that the total computation time is just m+2logn (word) steps. 


6.2.4 A Further Generalization 


The rdimensional mesh of trees was defined as a natural generalization of the 
computationally powerful 2-dimensional mesh of trees. M,,, can also be viewed as 
a generalization of the rcube, also a very powerful communications network. For 
example, M, ) is an rcube with every edge replaced by a path of length 2. Viewed 
in this light, the ~dimensional mesh of trees motivates the definition of a shuffle 
tree graph in the same way that the rcube motivates the definition of the shuffle- 
exchange graph. Although we have yet to investigate this graph in detail, it is quite 
possible that it has important applications. 


(As an aside, we should caution the reader that the asymptotic estimates given in 
section 6.2.2 do not necessarily apply to Af, since r was assumed to be bounded. 
The correct estimates are not difficult to work out, however.) 


6.3 The Tree of Meshes 


The N- node tree of meshes will be the first example of a ve graph known to 
have O(NiogN) layout area, — 


6.3.1 Definition 


The tree of meshes is similar to the 2-dimensional mesh of trees in that it 
combines the structure of a mesh with that of a complete binary tree in a natural 
way. Unlike the 2-dimensional mesh of trees, however, the tree of meshes is a 
planar graph. It is formed by replacing each node of a complete binary tree with a 
mesh and each edge by several edges which link the meshes together. More 
precisely, the root of the binary tree is replaced by an nxn mesh (where n is 
assumed to be a power of 2), its sons are replaced by n/2 x n meshes, their sons are 
replaced by n/2 x n/2 meshes, and so on until the leaves are replaced by /x/ 
meshes, In the place of each right edge of the binary tree (i.e., one which links a 
node to its right son), we link the rightmost column of nodes in the mesh 
corresponding to the father to the topmost row of nodes in the mesh corresponding 
to the right son. Similar replacements are made for /eft edges of the binary tree. In 
both cases, the connections are made so as to preserve the column and row order 
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of the nodes and to insure that the resulting graph will be planar. The resulting 
graph is refered to as the nxn tree of meshes and will be denoted by 7, . For 
example, we have drawn T, in Figure 6-2. 


Figure 6-2: The 4x4 tree of meshes Tg. 


6.3.2 Properties 

It is easily seen that the — tree of meshes 7, has 
1) N = 2n*logn+n? = O(n’logn) nodes, 

2) bisection width n = O(N’/2/log!2N) 7 
3) diameter 8n = O(N’//og!/2N) , and 
4) an O(N!”7/log!/2N)-separator. 

Thus we can easily infer that the N-node tree of meshes has 
1) layout area between Q(N) and O(NlogN), and 
2) maximum edge length between Q(/og!72N) and O(N!““Iog!/2N). 


In fact, we will show that the graph has 
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1) layout area O(NlogN) and 
2) maximum edge length O(logN). 


The maximum edge length bound is fairly straightforward. We will show in 
Chapter 8 that the wire area of the N-node tree of meshes is O(N/ogN). As the 
graph has Q(N) wires, we can conclude that some of them must have length at least 
Q(/ogN). The lower bound can, in fact, be achieved by a straightforward 
modification of the H-tree layout for binary trees [MR79]. 


In section 6.4, we will show how to augment the N-node tree of meshes so that 
any layout will have to contain a wire of length at least Q(N//7/log!/2N). 


6.3.3 Applications 


The tree of meshes is a particularly interesting planar graph since it can embed 
arbitrary planar graphs much more efficiently than can the ordinary mesh. For 
example, it is not known how to embed an arbitrary planar graph in less than an 
‘@(Nlog?N)-node mesh. As we show in part (a) of this section, however, any N- 
node planar graph can be embedded in an O(NiogN)-node tree of meshes. 


The tree of meshes can also be used to embed many nonplanar graphs which 
have O(N/72)-separators. For example, we will show in part (b) of this section how 
to embed M2, in T>, for any n. This result will later allow us to give a simple 
proof that the N-node tree of meshes has wire area at least Q(NiogN). 


(a) Embeddings of Planar Graphs 


In [LT77], Lipton and Tarjan prove an O(N/”“)-separator theorem for the class 
of planar graphs. Recently, Bhatt and Leiserson [BL8]] generalized this result by 
showing that the class of planar graphs has an O(N//4)-simultaneous separator. 
(An N-node graph G is said to have an fN)-simultaneous separator if for any 2- 
coloring (say, black and white) of the nodes of G, there are disjoint subgraphs G, 
and G, of G such that G, and G, each contain //2 of the black nodes and 1/2 of 
the white nodes of G, at most AN) edges link G, to G,, and both G, and G, have 
AN/2)-simultaneous separators.) In the following theorem, we show that any N- 
node graph with an O(N/”4)-simultaneous separator can be embedded in an 
O(NilogN)-node tree of meshes. As a corollary, we will thus be able to conclude 
that any N-node planar graph can be embedded in an O(NiogN)-node tree of 
meshes. 
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Theorem 6-1: Every N-node graph with an O(N'”)-simultaneous separator can 
be embedded in an O(NlogN)-node tree of meshes. 


Proof: Let G be an N-node graph with an AN)-simultaneous separator (f{(N) 
will later be chosen to be O(N/”*) ). Partition G into two subgraphs G, and Gy in 
accordance with the usual separator theorem. Color the nodes of G, (G,) white or 
black according to whether or not they are linked to a node in G, (G;). (To be 
precise, we should also weight each node according to the number of nodes in the 
other subgraph to which it is adjacent.) Now use the simultaneous separator to 
partition G, and G). Proceed in this manner until only isolated nodes remain. At 
each step, color the nodes in the subgraph white if they are adjacent to some node 
outside of the subgraph and black if they are adjacent only to nodes within the 
subgraph. 


After the first step, at most {) edges will link each (N/2)-node subgraph to the 
other. After the second step, at most AN)/2+fN/2) edges will link each (N/4)- 
node subgraph to any other. Using induction, it is not difficult to show that after k | 
Steps, at most 


ANV2ET 4 fIN/2/ 2%? + (N/A 2K3 4 oe + INS 2 22 + ANZ2K4) 


edges will link each (N/2*)-node subgraph to any other. In particular, for AN) = 
O(N/”) , we can conclude that at most O(m/”) edges will link any m-node ~ 
subgraph produced by “this process to any other subgraph. 


Each subgraph produced by the above procedure corresponds in a natural way 
to a mesh of the tree of meshes. For example, G corresponds to the root mesh, G, 
and G, correspond to the second level meshes, and so on. In general, each m-node 
subgraph corresponds to an O(m)-node mesh. Thus each mesh can be used as a 
switching network to embed the O(m/” 2) edges which link the corresponding 
subgraph to other subgraphs. As an example of how this is done, we have 
included Figure 6-3. In each switching network, the edges entering from the top 
are linked to the edges entering from the sides. The nodes of G are embedded in 
the bottom levels of the tree of meshes 0 


Corollary 6-1: Every N-node planar graph can be embedded in an O(NlogN)-node 
tree of meshes. 


Proof: Obvious 0 
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(b) Embedding of M,, in T>, 


Although we have not worked out the details, it appears likely that any N-node 
‘graph with an O(N/”“)-separator can be embedded in an O(NlogN)-node tree of 
meshes. In section 7.4.3, we prove a slightly weaker result; namely that every N- 
node graph with an O(N/”2)-separator can be embedded in some O(NlogN)-node 
planar graph. 


Of particular importance, however, is the fact that M, ,, can be embedded in T>, 
for any n. For example, consider the embedding of M, , in Tg displayed in Figure 
6-4. The embedding has been drawn as though it were construted as part of a 
larger embedding (say of M)>) in order to illustrate the recursive nature of the 
general embedding procedure. In addition, the nodes and edges of M, , have been 
drawn as they appear in Figure 6-1. For clarity, we have represented the nodes of 
_ Tg as pinpoints and omitted its edges altogether. Also notice that we have not 
included the bottom two levels of Ty since they are not needed for the embedding. 


The embedding of M,, in T>, for arbitrary n>4 proceeds as follows. 


step 1: Remove the roots of the row and column trees of Af) ,, and all the edges 
incident to them. | 


‘Step 2:. Embed the four copies of M) 2 obtained from step 1 in four separate 
copies of 7,, by calling this procedure recursively. 


step 3: Embed the 2n roots of the row and column trees in the 2n x 2n mesh 
so that 


J) the column roots are located at positions (ji) for 1 <i< n/2 and 
3n/2 < i < 2n, and 


2) the row roots are located at positions — (21,2i-1) and (2i-1,21) for 
W4< i <¢ 3n4. 


step 4: Draw left and right horizontal edges from each column root to the left 
and right outer columns of the 2n x 2n mesh and then to the appropriate node in 
the top row of the corresponding mn x 2n mesh. Similarly draw two left edges 
from each row root with position (2i-/,2i-/) for some i and two right edges from 
each row root with position (2i-/,2i) for some i. 


step 5: The nx 2n meshes are used as switching networks. In particular, we 
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‘igure 6-4: The embedding of Mj 4in Ts. 
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use them to make the following connections: 
1) Ud to (£2) for 1 < i<n/4 (column tree connection) 
2) (i to (i+n/2, 1) for n/4<i< n/2 (column tree connection) 
3) U, 2i- 1) to (41) for n/4< i< 3n/4 (row tree connection) 
4) (/, 2’) to(é 2n) for n/4<i< 3n/4 (row tree connection) 
5) (1, to (5n/2- i+], 2n) for 3n/2K i<7n/4 (column tree connection) 
6) (i) to (2n- i+], 2n) for 7n/4<i< 2n (column tree connection) 


step 6: Each nx 2n mesh can be easily linked to two copies of 7, , each of 
which contains an embedding of M) ,,) produced by this procedure. In particular, 
attach the wire leaving via the ith row of the nx 2n mesh to the node in the ith 
column of the appropriate nxn mesh of T,, for each n. (Note that the nodes in the 
nxn meshes are roots of M>,,,. and will become second level nodes of M),) 


6.4 The Augmented Tree of Meshes 


As we mentioned in section 6.3.2, the N-node tree of meshes can be laid out so 
that every wire has length at most O(/ogN). By slightly modifying the graph, 
"however, it is possible to increase the maximum edge length dramatically. The 
basic idea is to add a complete binary tree with n? leaves to the nxn tree of meshes 
so that the leaves of one are linked in a one-to-one fashion with the leaves of the 
other. It is important that the attachments between the two graphs be made so that 
the resulting graph (which we call the nxn augmented tree of meshes T,,') is planar. 
For example, we have drawn the 4x4 augmented tree of meshes in Figure 6-5. 


It is easily seen that the augmented tree of meshes has, up to a constant, the 
same bisection width, diameter, separator, layout area and number of nodes as does 
the original tree of meshes. By adding the binary tree, we have simply decreased 
the distance between any two /eaves of the tree of meshes. In Chapter 8, we will 
show that any layout of the N-node tree of meshes has two leaves which are spaced 
at least Q(N!/2/og!/2N) apart. We will thus be able to conclude that the maximum 
edge length of 7,,' is at least Q(nlogn) = Q(N’”*/log!/2N) . Using the techniques 
developed by Bhatt and Leiserson in [BL8]], it is not difficult to show that the 
lower bound is attainable. 
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Figure 6-5: The 4x4 augmented tree of meshes T,'. 
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CHAPTER 7 


CROSSING NUMBER ARGUMENTS 


In this chapter, we demonstrate the power of the crossing number as a lower 
bound technique for VLSI. We commence by showing that the crossing ‘number i 1S 
at least as large (up to a constant) as the square of the bisection width, In section 
7.2, we describe a powerful method for finding crossing number lower bounds. 
This method is then used in section 7.3 to find tight lower bounds on the crossing 
numbers of a variety of networks. We conclude in section 7.4 with a collection of 
miscellaneous results. Included are additional upper and lower bounds for the 
crossing number of a network as well as a procedure for embedding an arbitrary 
N-node graph with an O(N//”2)-separator in an O(NlogN)-node planar graph. 


7.1. The Relationship Between Crossing Number and Layout Area 


We first show that crossing number arguments are at least as powerful as 
bisection width arguments in establishing lower bounds for layout area. 


Theorem 7-1: If G is an N-node graph with crossing number c and bisection 
width b, then c+N > (6%). 


Proof: Let D be a drawing of G in the plane with c crossings. Replace each 
crossing of D with an artificial node. Call the resulting graph G' and note that it 
has precisely c+ N nodes. Using the weighted version of the Lipton-Tarjan planar 
separator theorem [LT77], it is possible to bisect the real nodes of G' (by assigning 
weight / to the real nodes and weight 0 to the artificial nodes) without cutting 
more than O((c+N)/”2) edges. After replacing the artificial nodes with their 
original edge crossings, it becomes apparent that we have, in fact, constructed an 
O((c+N)!7) bisection for G. Squaring, we find that c+N > Q(b2) 0 


Using a similar proof technique, we can show that the crossing number is also 
close to an upper bound for the layout area of a graph. In fact, should a really 
good layout algorithm for planar graphs be found, then the following result could 
become useful in laying out arbitrary graphs. 
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Theorem 7-2: Given an optimal drawing D for an N-node graph G with crossing- 
number c, it is possible to construct a layout for G with area at most 
O((c+ N)log*(c+.N)). Should a procedure be found which lays out an arbitrary N- 
node planar graph in A(N) area, then we could construct a layout for G with area at 
most O(A(c+N)). 


Proof: As in the proof of Theorem 7-1, we replace each edge crossing of D with 
an artificial node. The resulting graph G' has c+WN nodes and is planar. Using 
the methods developed by Lipton and Tarjan [LT77] and Leiserson [L80a], G' can 
be laid out in O((c+ N)log(c+N)) area. It is then a simple matter to replace the 
artificial nodes with their original edge crossings to obtain the desired layout for G. 
Alternatively, should an A(N)-area planar graph layout procedure be discovered, 
we could construct an O(A(c+N))-area layout for 'G O 


As we have just seen, the idea of replacing edge crossings with artificial nodes is - 
simple but powerful. Jai-Wei and Rosenberg have also employed this strategy in 
their work with embeddings of graphs in binary trees [JR81]. 


7.2 A General Method for Proving Lower Bounds 


In this section, we will describe a general method for proving crossing number 
lower bounds. A variant of this method will later be used to prove lower bounds 
for bisection width and wire area. The basic idea is as follows. . 


Given a drawing D for an N-node graph G, we will construct a drawing D' for 
the complete graph on N nodes Ky by tracing over the edges of D. For example,. 
we have done this for the 4-node graph shown in Figure 7-1. The edges of the 
original graph are drawn with dashed lines while. solid lines indicate edges of K, . 


If we are careful not to trace over each edge of D too many times during the. 
construction of D', it may be possible to infer somcthing about the number of 
crossings in D by counting the number of crossings in D'. This is due to the fact 
that the number of crossings in D is closely related to the number of crossings in. 
D'. For example, if e, and e, are edges of G which cross in D and e; is traced 
over s, times while e, is traced over s, times, then the crossing of e, with e, will 
appear 5,5, times in D'. Such a crossing of D' is called a crossing of the first kind. 
For example, there are four crossings of the first kind in the drawing of Ky in 
Figure 7-1. 
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crossings of 
the first kind 


eleva of the 
second kind 


Figure 7-1: Construction of K, from the drawing of a 4-node graph. 


Sometimes, it is necessary for two edges of D' to cross while traversing the same 
edge of D. Such a crossing is called a crossing of the second kind. Note that there 
is only one crossing of the second kind in the drawing of K, in Figure 7-1. Since 
D' can easily be drawn so that no pair of edges cross each other more than once, 
there are usually not very many crossings of the second kind. More precisely, if G 
has edges e,,..., e, and if edge e; is traced over s, times for each / during the 


construction of D', then D' can have at most Bs 2/2 crossings of the second 
kind. For most applications of the method, this number is substantially smaller 
than the number of crossings of the first kind in D' and thus we usually do not 
have to worry about crossings of the second kind. 


By showing that the number of crossings in D' is large, we can conclude that 
there must be a large number of crossings in D. For example, if each edge of D is 
traced over at most s times during the construction of D' and D' is found to have 
y crossings, then we can conclude that D has at least y/s* crossings. This follows 
from the fact that each crossing of D is replicated at most s* times in D'. (Note 
that we have neglected crossings of the second kind in this argument.) 


Fortunately, it is easy to find a good lower bound on the number of crossings in 
any drawing Of Ky . We state the result formally in the following lemma. The 
proof can also be found in Kleitman’s work [K70] but is generally regarded as 
folklore. 
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Lemma 7-1 (Kleitman [K70]): The crossing number of Ky. the complete graph 
on N nodes, is at least N(N-1I)(N-2)(N-3V/120 for N>S. 


Proof: Let D be a drawing of Ky in the plane with the smallest possible number 
of crossings c(V). We may assume that no pair of edges which cross in D are 
incident to a common node. Otherwise, it would be possible to produce a drawing 
D' for Ky with c(N)-/ crossings by exchanging the parts of the crossing edges 
which lie between the common node and the point of crossing. This would 
contradict the minimality of c(N). 


Consider the N subdrawings of D obtained by deleting one of the nodes and all 
of the edges incident to it. Note that each crossing of D appears in precisely N-4 
of the subdrawings. (A crossing does not appear in any of the 4 subdrawings 
which correspond to the deletion of a node incident to an edge of the crossing.) 
Since each of the subdrawings is a drawing of Ky.;, each must have at least c(N-/) 
crossings. Thus (N-4)c(N) > Ne(N-1). Applying the inequality recursively and 
noting that c(5)=/, we can conclude that 


(N) > [NAN-4)] [(N-DAN-5)]- + - [672] 
N(N-I\\N-2)(N-3/120 for N>S oO 


13 Applications 


Using the technique described in the previous section, it is possible to prove 
crossing number lower bounds for a variety of networks. In particular, we will 
prove lower bounds for the shuffle-exchange graph, the 2-dimensional mesh of 
trees and the rdimensional mesh of trees. We commence with the shuffle- 
exchange graph. 


7.3.1 Lower Bounds for the Shuffle-Exchange Graph 
Our main result in this section is the following. 


Theorem 7-3: The crossing number of the N-node shuffle-exchange graph is 


O(N2/log2N). 


Proof: As we showed in Part I of the thesis, the N-node shuffle-exhange graph 
has layout area O(N*/log?N). Thus O(N?//og2N) is an upper bound for the 
crossing number. In what follows, we will use the method of section 7.2 in order to 
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show that the crossing number -of the N-node shuffle-exchange graph is at least 
Q(N/log2N). 


Let D be any drawing of the N-node shuffle-exchange graph G where N= 2k. 
We first show how to construct a drawing D' of Ky on the nodes of G without 
tracing over any edge of D more than NiogN times. 


Given any pair of nodes a,---a, and b,---5,, draw the edge from 
a,---a, to b-.-- 6b, along the path 


Og + + 30,0) —? Ag--- A307, —> byay-- +434, —> byag--- a3b) —> 
byb;a,--+ a3 _ see —_ by.y +++ b pb by —_ by by.7 +++ 5b, . 


(In order that every edge of Ky not be drawn twice, we should assume that the 
value of a,--- a, is less than that of b,--- 5, but this has no bearing on the 
argument.) 


Wherever a;= b; for some i, the preceding path will have a loop. When actually 
drawing the edges of D', we ignore such loops. For example, the edge from 0/100 
to 1110] is drawn along the path 


01100 => o1101 -£> 10110 = 01011 => 10101 => 11010 <> 
: Holl => 11101. 


For convenience, we have labeled the shuffle edges with an => and the 
exchange edges with an > . Note also that we have omitted loops at JO/10, 
01011 and 4/0101 . 


It is not difficult to show that every edge of D is traced over at most NlogN 
times during the construction of D'. For example, consider the shuffle edge 
linking ay--+@ a, to aja,-++a). It is traced over during the construction of 


edges of D' which link a node of the form 
ki 


Fes ee 
Api?" 47: o +g 
to a node of the form 
i 


ia 
eo Ayes Oy 54 2 


for any i, 1<i<k (where * indicates either a 0-bit or a /-bit). It is easily seen that 
there are at most k2* such edges in D 'and thus cach shuffle edge is traced over at 
most NiogN times. A similar argument shows that each exchange edge is also . 
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traced over at most NiogN times. 
Since each edge is traced over at most NiogN times, there can be at most 
(3N/2) [(NlogN)*/2] = 3N3/(4log?N) 


crossings of the second kind in D'. This is substantially less than total number 
Q(N*) of crossings in D'. Thus D' must have 2(N*) crossings of the first kind. 
As each edge of D is traced over at most NiogN times, this means that D has at 
least Q(N4/(NiogN)?) = Q(N7/log?N) crossings O 


As the N-node shuffle-exchange graph has O(N) edges, we can conclude from 
Theorem 7-1 that some edge of any layout for the graph must cross at least 
Q(N/log?N) other edges. We do not know whether or not this bound can be 
achieved, however. The only known layouts for the N-node shuffle-exchange 
graph have edges which cross at least Q(N//ogN) other edges. 


It is also worth pointing out that the preceding argument can be used to prove 
that the N-node shuffle-exchange graph has bisection width at least Q(N//ogN). 
The result follows from the observation that Kj has bisection width @(N) and the 
fact that every edge of D was traced over at most NlogN times during the 
construction of D'. This means that the bisection width. of the N-node shuffle- 
exchange graph is at least {(N2/(NlogN)) = Q(N/logN), as claimed. 


In fact, a similar modification of the method described in section 7.2 can be used 
to find tight bisection width lower bounds for a// of the networks we have 
investigated. For most of these networks, however, it is much more useful to study 
the corresponding crossing number and wire area bounds. 


7.3.2. Lower Bounds for the 2-Dimensional Mesh of Trees 


In this section, we use a more sophisticated version of the method of section 7.2 
to prove a nontrivial lower bound on the crossing number of the 2-dimensional 
mesh of trees. 


Theorem 7-4: The crossing number of the N-node 2-dimensional mesh of trees is 
at least Q(NiogN). 


Proof: As before, let Mf, ,, denote the 2-dimensional mesh of trees (where n is a 
power of 2). We will show that the crossing number of M),, is at least 
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(n2logn - 121n?+121n/40 for all n>. 
Since M>,, has N=@(n’) nodes, this will be sufficient to prove the desired result. 


The proof consists of two steps. In the first, we show how to construct a drawing 
of K,2 from any drawing of M 2n by tracing over the edges of M ee We then 
apply Lemma 7-1 to conclude that there are a large number of crossings among the 
edges in the top levels of the binary trees of M>,, . In the second step, we 
complete the proof by inductively applying the result of the first step. 


step 1: Let D be any drawing of M),, in the plane. From this drawing, we can 
construct a drawing D' of K,,» in the following way. First locate the n? leaves of 
the binary trees of D. They will serve as the nodes for K,,2. Given any pair (i) 
and (k,/) of these nodes, draw an edge from (é/) to (kJ along the unique path 
from (ij) to (4) in the ith row tree of D and then from (iJ to (k,/ in the Ith 
column tree of D. (In order that each edge not be drawn twice, we shall assume 
that i<k and, when i=k, that j</) As usual, we assume that the edges of D' are 
‘drawn so that no pair cross each other more than once. 


We next count the number of crossings of the second kind in D'. In order to 
do this, we need to count the number of times each edge of D is traeed over during 
the construction of D'. It is not difficult to show that each edge in the ith level of 
a binary tree of M 2n (henceforth, referred to as a type i edge) is traced over at most 


nZi(n?-n?2h) < n3zi 
times for any ‘</ogn during the construction of D'. Thus at most n°22"! crosses 
of the second kind can occur at any type i edge of D. Since there are 2/+/n type i 


edges in M,,,, we can conclude that the total number of crosses of the second kind 
in D' is at most 


logs sogn 
S24 Iny(nbz 2) _ nS < n’. 
der ete 


We next count the number of crossings of the first kind (i.e., those 
corresponding to crosses in D). We say that a crossing of D is type i-j if it is the 
crossing of a type / edge and a type j/ edge. Let lj denote the number of type i 
crossings in D and set (;= Ql - Since each type / edge is traced over at most n?2? 
times, each type i-/ crossing of D produces at most (#?2(n324) = n®2") crosses 
of the first kind in D'. Thus the total number of crossings of the first kind in D' 
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is at most 
legn logn : 
BS ert, < < re BI%, 
ay) Jte 


Summing, we find that the total number of crossings of either kind in D'is at 


most. 7’ +f 377%, . By Lemma 7-1, this number must be at least 
n(n?-1)(n2- Nn2- 3/120 for n2>5. Simplifying, we can conclude that 
. Jign 
D22t, > (n?-121n/120 for n>6. 
é: 4 


Let s,= Si, be the number of crossings involving at least one edge from the 
top k levels of some binary tree of M>, . In what follows, we will use the 
preceding inequality to show that s, > (n2- -121n)k/40 for at least some value of 
k>I1. Assume otherwise and observe that 

/eaa 
3 reir, = Sy ris - SS; D 
ces 
where Ss, is defined to be 0. The coefficient of each s; (i=0) in this sum is 2 21.7 ak 
2 which is positive so for each i we may substitute (n?-/2/n)i/40 as an upper 
bound for s; in order to see that 


2%, € [(n?-121n/40] S40) 
= [(n?-121n)/40] ‘Se, 


yn 
Since 24! < 1/3 for all n, we can conclude that 


logan 
pe, < (n?-121nV120 for all n>121, 
a contradiction. Thus for all n>/2/, there is a K>/ such that s, > (n?-121n)k/40. 


step 2: Let c(n) denote the crossing number of M),,,. Using the result of step 1, 
we will now show by induction on n that c(n) > (n2logn - 121n?+121n)/40 for all 
n>, 


As (n*logn - 121n?+121n)/40 is nonpositive for small n, the lower bound 
trivially holds for all n</28, Assume that the lower bound holds for all m<n where 
n>128 and let D be any drawing for Af,,,. By counting the crossings of D in two 
groups according to whether or not at least one edge of the crossing is contained in 
the top k levels of the binary trees of M 2n > We can observe that 
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dn) > 2konr*) + sy. 


(Recall the definition of s, and the structure of M,,,.) By choosing & as in step 1 
so that s, > (n?-121n)k/40 and applying the inductive hypothesis for c(nZ k), we 
obtain 


cn) > 2°k(n222k(logn-k)/40 - 121n?22k40 4 121n2*/40] + n2k/40 - 121nk/40 


V 


nlogn/40 - 121n*/40 + 121n/40 + 121n(2k-k-1/40 
(n@logn - 12In? + 12In)/40 . 


IV 


Thus the inductive hypothesis is established and we can conclude that the 
crossing number of M,,, is at least Q(n7logn) = Q(NlogN) O 


In section 7.4.3, we will show that the crossing number of any N-node graph 
with an O(N//*)-separator is at most O(N/ogN). Thus, we will be able to conclude 
that the crossing number of the N-node 2-dimensional mesh of trees is precisely 
_O(NlogN). 


7.3.3 Lower Bounds for the “Dimensional Mesh of Trees 


By modifying the proof of Theorem 7-4, it can be shown that any layout of the 
r-dimensional mesh of trees must have very long wires. In particular, they must be 
as long as the width of any optimal layout for the graph. We state this result more 
precisely in the following theorem. j 


Theorem 7-5: Any drawing of the N-node r-dimensional mesh of trees contains 
an edge which crosses at least Q(N'!/" other edges. 
Y 


Proof: The rdimensional ‘fxax---xn mesh of trees M,, has 
N = (r#J)n’ - rn™! = O(n") nodes for bounded r. We will show that any layout 
D of M,,,, contains an edge which crosses at least .Q(n™!) = Q(N!-/”) other edges, 
thus proving the theorem. The method used is very similar to that of Theorem 7-4. 


As we did for the case of r=2 in Theorem 7-4, we first construct a drawing D' of 
the complete graph on the n’ leaves of M,,,. Each type i edge of D is traced over 
at most n’*!/2‘ times by this procedure. Thus the total number of crossings in D' 
is at most 

Jogn ; 
(rn3tt y2 4+ ert 2372p 


est 
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y 


teqn 

where, as before, i= ly and ¢,; is the number of type é/ crossings in D. 
= fn 

Applying Lemma 7-1, we can conclude that 3127, > Q(n?r2) . 


Let 5,= Si, be the total number of crossings of D involving an edge from the 
top k levels of the binary trees in M,,,. Using arguments similar to those used to 
prove Theorem 7-4, it is not difficult to show that for large n, there exists a k such 
that s, > Q(n2"224). As there are only rn™/(2k+!-2) edges in the top k levels of 
M,., for any k, we can conclude that at least one of them crosses at least Q(n™!) 
other edges 0 


It is worth pointing out that the preceding arguments can also be used to show 
that the crossing number of the N-node r-dimensional mesh of trees is O(N72”) 
for bounded P2. 


7.4 Further Methods 


In this section, we describe some additional methods for proving crossing 
number bounds. We first generalize Lemma 7-1 to prove a combinatorial lower 
bound on the crossing number of any N-node graph with at least 4N edges. This 
result is then used in section 7.4.2 to prove crossing number lower bounds for a 
class of graphs which are similar to the 2-dimensional mesh of trees. We conclude 
by proving a nontrivial upper bound on the crossing number of graphs which have 
O(N//*)-separators. As a corollary, we wiil show that any N-node graph with an 
O(N!”2)-separator can be embedded in some O(N/ogN)-node planar graph, thus 
generalizing Theorem 6-1. 


7.41 A Combinatorial Lower Bound for Crossing Numbers 


In this section, we substantially generalize the result of Lemma 7-1. 
Throughout, we assume that G is a simple graph (i.e., that it has no loops or 
multiple edges). 


Theorem 7-6: If G isa graph with E edges and N nodes where E>4N, then the 
crossing number of G is at least E3/375N?. 


Proof: The proof is by induction on N. For N=/, the result is vacuously true. 
Assume that the result is true for all N'<N where N>/ and let G be a graph with 
N nodes and E edges where E>4N. We will show that the crossing number c of 
G is at least £°/375N°, thus proving the theorem.. There are two cases to 
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consider. 
caseel: 4N< E< SN. 


We first use Euler’s formula [BLW/76] in order to show that the genus of G is 
large. Euler’s formula states that 


E+2=N+f+ 22 


where fis the number of faces of any proper embedding of G on a surface of 
genus g. Since G has no loops or multiple edges, every face contains at least 3 
edges and thus 3f<2E. Substituting, we find that 


2g = E+2-N-f 
> E+2-N- (2E/3) 
E/3+2-N 


and thus that g>(E-3N)’6. For 4N< E< SN, it is not difficult to show that 
(E-3N)6 >E3/375N? and thus that g > E3/375N2, 


Given any graph with crossing number c, it is possible to find a proper 
embedding of the graph on a surface with genus c. We can do this by drawing the 
graph on a sphere so that only ¢ pairs of edges cross and then putting a "handle" 
in the region immediately surrounding each crossing. The edges of the crossing 
can then be redrawn through the handle so that they no longer cross. As the 
resulting surface has genus c, we can conclude that g<c for any graph with genus g 
and crossing number c. In particular, we can conclude that c > E3/375N* for G. 


case 2; E> SN. 


Let d,;,..., dy be the degrees of the N nodes of G and let D be an optimal 
drawing of G. As usual, we can assume that no pair of edges which cross in D are 
incident to the same node of G. Consider the subdrawing D, of D obtained by 
deleting the ith node of G and all the edges incident to it. This subdrawing is also 
a drawing of a graph with N-/ nodes and E-d; edges. Since E>5N and d;<N-1, we 
can conclude that 


E-d, > 4N+1 > 4N-D). 


Thus we can apply the inductive hypothesis to D; in order to conclude that it has at 
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least (E-d)3/1375(N=1)2] crossings.” 


Each crossing of D will appear in precisely N-4 of the N subdrawings of D 
produced by the above procedure. Applying the technique used to prove Lemma 
7-1, we can thus conclude that 


c > [IAN-4)] S (E-dyPB75(N-1) 
= [1/375(N-4\(N-1)?] | S(E3 - 3E*d, + 3Ed? - d,) 
= [1/375(N-4(N-1)] [E3N - 3EX(2E) + $(3Ed? -d?)) . 


AJ “ 
Since 2E = }id;,, it is not difficult to show that 3 (3Ed?-d3) attains its 
ry) re 
minimal value when d, = 2E/N for I<i<N. At this point, 


$(3Ed?-d3) > 12E9/N - 8E3/N? 
and thus . 
c > (E3N-6E3+12EIN - 8E2/N*) /[375(N3 - 6N2+9N - 4)] . 
For N>2, this expression can easily be reduced to show that c > £3/375N? 0 


‘It is interesting to note that the lower bound proved in Theorem 7-6 is (up to a 
constant) tight. For example, the N-node graph consisting of N2/E disjoint copies 
of Ken has O(E) edges and crossing number at most O(E7/N2) for any E>4N. 


7.4.2 Applications 


When defining the 2-dimensional mesh of trees, we required that the binary 
trees be interconnected so that M>,, contain 22k disjoint copies of Mp ,7* as 
subgraphs for any k. Not only is this definition the most natural, but it also allows 
us to use induction in the lower bound proofs for the network. Surprisingly, 
however, the constraint is not necessary in order to show that M),, can perform 
matrix-vector multiplication, sorting or switching in O(/ogn) time. In fact, any 
network consisting of n row trees and n column trees which share the same set of 
leaves can do these operations quickly. Thus it is conceivable that some other 
arrangement of the tree interconnections might lead to a network with a smaller 
crossing number. In what follows, we use Theorem 7-6 to show that this is not the 
case, 
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Theorem 7-7: If G is an N-node graph formed in the same way as the nxn mesh 
of trees except that arbitrary interconnections are allowed between the leaves of the 
binary trees, then G must have crossing number at least Q(NlogN). 


Proof: Let G, denote the subgraph of G obtained by deleting the nodes and 
edges in the top k levels of the binary trees of G for 0<k<logn. For example, if 
G=M),,,, then G, consists of 22* disjointcopies of My ,7 . Otherwise, G, is a. 
graph for which each node of the original nxn matrix of nodes is a leaf of a 
horizontal complete binary tree of depth Jogn- k and a leaf of a vertical complete 
binary tree of depth /ogn- k. For each k, let H, denote the graph whose nodes 
are the n? leaves of G, and whose edges are the paths in G, of the form 


leaf~ path in horizontal binary tree ~ leaf— path in vertical binary tree — leaf. 


Note that if G=M,,,, then H;, consists of 27* disjoint copies of K,272k . In any 
case, H, is a regular graph for which each node has degree n?27k-] , 


Given any drawing D, of G,, it is easy to construct a drawing D,' for H, by 
tracing over the edges of G, in the natural way. It is not difficult to see that each 
type i edge of G is traced over at most (2/08"-k)327(r-k) = 322k! times by this 
procedure for k. Thus each type i-/ crossing is reproduced at most n°274t < 
noz4k-2i times for j>i>k. 


Given any drawing D of G, construct 26 separate drawings D,' of H, for each 
k>0. Each type i crossing of D will appear a total of 


e-! 


Sind 74k 226k) = Pi $172k 
K=o aro 


O(n‘) 


lA 


times in these drawings. In what follows, we will show that there are at least 
Q(n8logn) total crossings of the first kind in these drawings. We will thus be able 
to conclude that the crossing number of G is at least Q(n7/ogn). 


As H, has Ey = O(n4274) edges and Ny = n? nodes, we can apply Theorem 
7-6 to conclude that D," has at least Q(E,3/N,2) = (82) crossings. Thus 
there are at least 2(n*) crossings among the 2* drawings D,' . Summing over k 
for 0<k<logn, we find that there are at least Q(n°/ogn) total crossings among all of 
the drawings {D,'| 0<k</ogn }. It is not difficult to check that there are at most 
O(n72°54) crossings of the second kind in each drawing of H1,. As there are 26k 
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such drawings for each k, we can conclude that there are at most 
logn 
S725 << O(n) 
Keri 


total crossings of the second kind in all the drawings {D,'| 0<k<logn }. Thus 
there are at least Q(n*/ogn) total crossings of the first kind and the crossing number 
of G is at least Q((n8logn)/n5) = Q(n7logn) = Q(NlogN) O 


As a corollary, we can see once again that the crossing number of M) , is at least 


Q(Nlog?N). 
7.4.3 An Upper Bound for Crossing Numbers 


Since any N-node graph with an O(N®)-separator for some a>J/2 has an 
O(N)-area layout, we can easily see that it also has crossing number at most 


O(N22). By Theorem 7-1, we can conclude that this bound is tight since many _. 


such graphs also have bisection width at least 2(N®). 


The situation is not as clear for graphs with O(N/”2)-separators, however. For 
example, the best known upper bound on the layout area of an N-node graph with 
an O(N!”2)-separator is O(Nlog?N) yet no such graph is known to have a crossing 
number greater than Q(Nl/ogN). In what follows, we prove a tight upper bound on 
the crossing number of any such graph. 


Theorem. 7-8: The crossing number of any N-node graph with an O(N!”2)- 
separator is at most O(NlogN). 


Proof: Given such a graph G, we will construct a drawing for G with at most 
O(NlogN) crossings. In order to construct the drawing, we will 


1) decompose G into subgraphs according to the separator theorem, 
2) draw the subgraphs by recursively calling the procedure, and 


3) draw the edges which link the subgraphs together without introducing too 
many crossings and so that every node remains "close" to the exterior of the 
drawing. 


In order to illustrate the procedure, we will describe in detail how drawings D, 
and D, of two m-node subgraphs are used to construct a drawing D of the 
combined 2m-node subgraph. Let c(#) denote number of crossings in D, or D,, 
whichever is larger. Further let dm) denote the maximum number of edges which - 
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must be crossed in order to draw an edge from any node in D, or Dp to the 
exterior of D, and D,. Constnict D from the drawings of D, and D, by drawing 
in the O(m!7) edges which link them together in the best way possible. Now let 
c(2m) and d(2m) be the obvious values for the constructed drawing D. It is not 
difficult to show that : 


2m) < 2m) + O(m) + O(m!din)) 


and that 
K2m) < dm) + O(m'”%), 


Solving the recurrences in general, we find that dm) < O(m//?) and thus that 
c(m) < O(mlogm) . Thus the above procedure can be used to find a drawing for G 
with at most O(NiogN) crossings O 


Using the preceding result, we can substantially generalize Theorem 6-1. 


_ Theorem 7-9: Any N-node graph with an O(N'””)-separator can be embedded in 
an O(NlogN)-node planar graph. 


| Proof: Construct a drawing of the graph with O(N/ogN) crossings according to 
the method described in the proof of Theorem 7-8. Replace each edge crossing in 
the drawing with an artificial node. The resulting graph has O(N/ogN) nodes, is 
planar and embeds the original graph O 
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CHAPTER 8 


WIRE AREA ARGUMENTS 


In this chapter, we extend the method of section 7.2 to prove lower bounds on 
the wire area of a variety of networks. In each proof, we will use a layout of a 
network to produce a layout for the complete graph. By showing that the nodes of 
the layout are widely spread out, we will be able to conclude that the wire area of 
the layout for the complete graph is very large. Provided that the edges of the 
original network were not traced.over too many times, we can then reason that the 
wire area of the original network is also large. 


8.1. Lower Bounds for the 2-Dimensional Mesh of Trees 


In this section, we find tight lower bounds for the layout area and maximum 
edge length of the 2-dimensional mesh of trees. . 


Theorem 8-1: The wire area of the N-node 2-dimensional mesh of trees is at least 


O(Nlog?N). 


Proof: As usual, we denote the nxn mesh of trees by M;,. In addition, let 
w(n) denote the wire area of M>,, and let a be a positive constant such that 


(*) @ 
(es) a < 242082) for all ipl 


n/(4log’n) for all n>2, and 


lA 


A 


ad 
where B= > j “2, also a constant. Clearly such a constant exists (a=2 7? should 
suffice) and a Wn) > anlog’n for n=1 and 2. Consider a value of n>4 
which is a power of 2 and assume that for all values of m<n which are powers 2 
that u(m) > am?log?m . We will use induction to show that Wn) > an2log?n . 
Since M,,, has N=@(n*) nodes, this will be sufficient to prove the theorem. 


Consider any layout for M,,, which uses w{n) wire. Partition the layout into 
three vertical strips V,, V,; and V, so that the center strip contains 3n2/4_ leaves 
and each outer strip contains n2/8 leaves. Similarly partition the layout into three 
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horizontal strips H),H, and H, so that the middle strip contains 3n’/4 leaves 
and each outer strip contains 77/8 leaves. For example, see Figure 8-1. 


Figure 8-1: Partitioning of the layout for M 2n- 


Let p denote the length of the longest side of the center block formed by the 
intersection of V; and H,. Without loss of generality, we assume that the longest 
side is horizontal. In what follows, we will show that p > (a!/2nlognV8 . 


Since each of the regions V,NH, and V NH, can contain at most 17/8 
leaves, it is clear that V,;NH, contains at least n2/2 leaves. Consider the n3/2 
subgraphs of M,, produced by eliminating the top (3/ogn)/4 levels of the row 
and column binary trees of M 2n- Each of these subgraphs is isomorphic. to 
M, ,14. By the pigeonhole principle, at least 1/2 of these subgraphs have at least 
one leafin V;NH,. If p< (a!/*nlogn/8 (otherwise we are done), then at most 
4p< (a!/nlogn/2 edges can cross the boundary of V,;NH, . Thus at most 
(a//2nlogn)/2_ of the subproblems which have at least one leaf in V,NH;, can 
have some node or part of an edge outside V,;NH,. This means that at least 
(n3/2 - q!/2nlogn/2 copies of M),1 are wholly contained in V,NH, . 
Applying the inductive hypothesis, we conclude that V,QH, contains at least 


(13/2 - @!/2nlogn) Wn!) 72 > (an*log?n - 03/2n3/login) / 32 


(an?log?n/64 wire. 


IV 


rs) 


(The last inequality follows trivially from (*).) Thus V,;NH, has at least 
(an2log¢n/64 area and p > (a!“2nlogn)/8, as claimed. 


We next use the layout for M>,, to construct a drawing for the complete graph 
on n? nodes (namely, the n? leaves of M 2n)- No matter how the edges of the 
complete graph are drawn in the plane (e.g., they may cross or overlap), it is clear 
from Figure 8-1 that the sum of the lengths of all the edges (as measured in 
Euclidean space) is at least n4p/64 > (a!/2n>logn)/29 . This is due to the fact 
that n‘4/64 edges pass from region Vo toregion V, and that these regions are 
separated by a distance p. . 


Let L; denote the sum of the lengths of the edges in the ith levels of the binary 
trees of M,,,. Since every level i edge is traced over at most n>Z! times in the 
drawing of the complete graph, we can conclude that 


fogn 
S120 > (a! Wlogn/29 


el 


and thus that 
togn ; 
| pais > (a!/2n2logny29 . 
In particular, this means that 
. L; > (a!/2n?logn2'\2°B?*) 
3 Soo 
for some i < logn. (Recall that B = > j 2 .) Otherwise, 
: & 
L; < (a!/n*logn2)\K(2°B ?) 
for. 1 < i < logn and thus 
loan ¥ logy) 
Sep < S(a!/2nogn(29B:?2) 
az) ute 
< (a!/n7logn/2°, a contradiction. 
Using the straightforward relation 
Wn) > (nr) +L; 
where i has been chosen so that 
L; > (a!/2nZlogn2)A(2°B 2) , 


we can conclude that 
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2ia(nz')(logn - i)? + (a!n2logn2)\(2°B i) 
> an?log?n - 2ain?logn + (a!/n?logn2)A 2B?) 


> anlog?n . 


w(n) 


IV 


V 


(The last inequality follows trivially from (#*).) Thus wn) > Q(n2log?n) for alln OD 


Theorem 8-2: Any layout of the N-node 2-dimensional mesh of trees contains a 
wire of length at least Q(N'/2logN/loglogN). 


Proof: \t is sufficient to show that any layout for M) ,, contains a wire of length 
at least Q(nlogn/loglogn). Assume for the purposes of contradiction that this is not 
the case and consider a layout of M,,, for which the longest wire has length 
q<< O(nlogn/loglogn). Using arguments similar to those used to prove 
Theorem 5-2, we first show that (without loss of generality) the area of such a. 
layout is at most  O(g2log?n) << O(n?log4n) . 


Since every pair of nodes of M 5 is linked by a path of length at most 4/ogn, all 
of the nodes in the layout are contained in a 4glogn x 4qlogn square. At most 
l6glogn wires may leave and re-enter the square at various points along its 
boundary. Without increasing the lengths of any of these wires, it is possible to 
rewire the segments outside the square using at most O(q/og2n) additional area. 
Thus, the resulting layout for M 2n Will have maximum edge length qg and area at 
most O(q2/og?n). 


The proof is completed by observing that any layout of M) ,, with area less than 
O(n2/og*n) must have a wire of length at least Q(nlogn/loglogn). From the proof 


of Theorem 8-1, we know that S12! i> (a!n2logn2? . Thus either 
rai 
1) there isan i < 4loglogn such that L,; > (a!/2n*logn2)\(2!loglogn) , or 
2) there is an i> 4loglogn such that L; > (a!/2n*logn2)\K(2! pit) 


od 
where, as before, the constant B= 2 j *. Otherwise, 


togn 
212 = S121 i se 
uz ou, +1 
< al” 2n2logn)/2!9 -+ [(a!Z 2n?logn)/2!°B| "S72 
er Yoglogn +) 
< (a!/n*logn)/2? , a contradiction. 
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The second condition cannot possibly be true, however. If it were, the area of 
the layout would be at least 


Q(n2logn/i) 
> Q(n*log’n/(loglogn)?) 


> An*login) , a contradiction. 


q.: 


I 


IV 


V 


Thus the first condition must be true and there is an / such that L; > 
Q(n7logn2‘/loglogn) . Since there are n2‘*/ type i edges in M,,, we can conclude 
that at least one of them has length at least Q(nlogn/loglogn) O 


8.2 Lower Bounds for the Tree of Meshes 


Using the results of the previous section, it is easy to demonstrate the existence 
of planar graphs which cannot be laid out in linear area and which must have long 
wires. In particular, we can conclude the following. 


Theorem 8-3: The wire area of the N-node tree of meshes is at least 2(NlogN). 


Proof: As we showed in section 6.3.3b, the N-node 2-dimensional mesh of trees 
can. be embedded in an O(NlogN)-node tree of meshes, By Theorem 8-1, we can 
thus conclude that the wire area of the NiogN-node tree of meshes is at least 
Q(Nlog?N). . Equivalently, the wire area of the N-node tree of meshes is at least 
Q(NlogN). O . 


Theorem 8-4: Any layout of.the N-node augmented tree of meshes has a wire of 
length at least Q(N!/*/log!/2N). 


Proof: In the proof of Theorem 8-1, we showed that any layout of M),, has two 
leaves which are spaced at least Q(nlogn) distance apart. Since (as we showed in 
section 6.3.3b) Mz, can be embedded in 7,, so that the leaves of M),, are 
embedded in the leaves of T;,, we can observe that any layout of T>, also has 
two leaves which are spaced at least Q(n/ogn) distance apart. Since every pair of 
leaves in 75, are linked by a path of length at most O(/ogn) in T,', we can 
conclude that some edge of 7," has length at least Q(n) = Q(N//*/log'”7N) O 


Had we so desired, we could have proved both results directly, using arguments 
identical to the ones uscd to prove Theorem 8-1. 
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8.3. Lower Bounds for a Restricted Class of Binary Tree Layouts 


In [BK80], Brent and Kung considered layouts of N-node complete binary trees 
for which every leaf is located on the boundary of some convex region. In 
particular, they showed that the wire area of any such layout is at least Q(N/ogN). 
Recently, Patterson, Ruzzo and Snyder [PRS81] extended this result by showing 
that any such layout with area A must have some edge of length Q(N//og(A/N)) . 
In particular, this means that if A = O(NlogN), then there must be some edge of 
length Q(N//oglogN) but that if A = O(N’+®) for some e>0, then there must 
only be an edge of length Q(N//ogN). In what follows, we show how to use the 


techniques developed in this chapter to give short proofs of these facts. 


Theorem 8-5 (Brent and Kung [BK80]): Any layout of the N-node complete 
binary tree in which every leaf is on the boundary of some convex p-region requires 
Q(NlogN) area. 


Proof: Given any such layout, we first use the methods of section 8.1 to 


_ construct a layout of the complete graph on the n= @(JN) leaves of the tree. Since 


the leaves are on the boundary of some convex region, it is easily shown that the 
layout of K,, uses at least Q(n3) wire. 


Let L; denote the sum of the lengths of the edges in the ith level of the tree. As 
each ith level edge is traced over at most n22? times, we know that 


to 
S32! Ll < Q(n) 
h 
and thus that S12! > Q(n) . Using arguments similar to those in the proof of 


Theorem 8-1, we can conclude that L; > Q(n2/7i2) for at least one value of i. 
Letting wn) denote the wire area of the binary tree layout, we can see that 


Wn) > Awn24 + A(n2/i). 
Solving the recurrence, we find that wn) > Q(nlogn) = Q(NlogN) a 


Theorem 8-6 (Patterson, Ruzzo and Snyder [PRS81]): Any A-area layout of the 
N-node complete binary tree in which every leaf is on the boundary of a convex 


region has some edge of length Q(N/log(A/N)). 


Proof: The proof follows that of the preceding theorem until it is concluded that 


leg 


SL 2! > Q(n). Using methods similar to those used to prove Theorem 8-2, we 
al : 
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can then observe that one of the following conditions must be satisfied: 


1) there is an i < 2og(A/n) such that L; > Q(n2/log(A/n)) , or 


2) there is an i > 2log(A/n) such that L; > Q(n2/i) . 


The second condition cannot possibly hold since, if it did, the layout area would 
be at least L; > Q(n2i"7) which, for i > 2log(A/n) , means that 


A > QAYnlog*A/n)) 


> QA), a contradiction. 


Thus the first condition holds and we can conclude that there is an / such that 
L;> Q(n2Vlog(A/n)) . As there are only 2/+/ edges in the ith level, at least one of 


them must have length at least Q(n/log(A/n)) = Q(N/log(A/N)) O 


95 


CONCLUSION INDEX  REPERIENCES 


and ADDENDUM 


This empty page was substituted for a 
blank page tn the original document. 


CONCLUSION 


In Part I of the thesis, we described several new layouts for the shuffle-exchange 
graph. In particular, we found 


1) an asymptotically optimal la -area layout of the N-node shuffle- 
exchange graph, and 


2) practical layouts for small shuffle-exchange graphs. 


As a result, it should now be possible to construct large scale shuffle-exchange 
chips. The only remaining question is whether or not there is a layout of the N- 
node shuffle-exchange graph for which every wire has length at most. O(N/log’N). 
All known layouts have wires of length at least Q(N//ogN). 


In Part II of the thesis, we descibed techniques for finding good lower bounds 
on the crossing number, wire area, maximum edge crossing and maximum edge 
length of a variety of VLSI networks. In particular, we applied these techniques to 
find 


1) an N-node planar graph which has layout area @(N/ogN) and maximum 
edge length O(N!”/log!2N), 


2) an N-node graph with an O(N/”*)-separator which has_ layout area 
@(Niog2N) and maximum edge length O(N/”7/ogN/loglogN), and 


3) an N-node graph with an O(N®)-separator (for a>J/2) which has maximum 
edge length O(N®). 


Thus we have answered all the open questions concerning bounds for layout 
area and maximum edge length of networks with known separators. We have only 
partially answered the corresponding questions for planar graphs, however. In 
particular, it would be of great interest to know whether or not every N-node 
planar graph can be laid out in O(N/ogN) area. 
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minimum number represented 1/8 


necklace /0 
odd node 22 


primary block of zeros 2/ 
primary node 22 
_ primary piece ofa necklace 26 
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radius ofa necklace J/8 
reverse edge 3] 
rightedge 65 


secondary block of zeros 2/ 
secondary node 22 
secondary piece ofanecklace 26 
separator J5/ 

shiftedge 30 

shuffleedge 3 
shuffle-exchange graph 3 
shuffle-shift graph 3/ 
shuffle-shift-reverse graph 3] 
shuffle-tree graph 65 

simple graph 83 
simultaneous separator 67 
sizeofanecklace 15 
sizeofanode 8 


Thompson model 2 
track 2 

transpose edge 32 
treeofmeshes 65 
typeiedge 80 
type i-jcrossing 80 
valueofanode 9 


wire area 5 


[BK 80] 


[BL81] 


[BLW76] 


[BO78] 
[CM8]]. 
[GO81a] 
[GO81b] 
[HL80] 
[JR81] 


[K70] 


[KL78] 
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ADDENDUM 


Much has been accomplished during the period of time between the submission 
of this thesis to the MIT math department in August of 1981 and the publication of 
the thesis as a technical report in June of 1982. In fact, so much has been 
discovered in the interim that it would be possible to write several additional thesis 
on the subject. As an aide to those who wish to know more about the new 
material, we have included below a brief bibliography of some of the recent work 
on layout strategies for VLSI. 
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